Knowledge Base (RAG)

Create and manage RAG (Retrieval-Augmented Generation) instances that let your AI agents answer questions using your own documents, PDFs, and knowledge base.

How It Works

1

Create a RAG server

Each server is an isolated knowledge base with its own documents and embeddings.

2

Upload documents

Upload PDFs, Markdown, HTML, plain text, code files, or images (with OCR). Documents are automatically split into chunks and embedded.

3

Query for relevant context

Search your knowledge base with natural language. The system returns the most relevant chunks ranked by similarity score.

4

Use context in prompts

Pass retrieved chunks as context to your chat completions for grounded, accurate answers.

Create RAG Server

POST/v1/rag/serversAuth Required
Bash
curl -X POST https://api.llmhub.one/v1/rag/servers \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMHUB_API_KEY" \
  -d '{
    "name": "Product Documentation",
    "description": "Knowledge base for all product docs",
    "embedding_model": "openai.small",
    "chunk_size": 512,
    "chunk_overlap": 50,
    "top_k": 5,
    "similarity_threshold": 0.3
  }'

Parameters

ParameterTypeDescription
nameRequiredstringDisplay name (2–255 chars)
descriptionstringOptional description
embedding_modelstringEmbedding model to use. Options: openai.small (1536d), openai.large (3072d)
chunk_sizeintegerCharacters per chunk. Default: 512
chunk_overlapintegerOverlap between chunks. Default: 50
top_kintegerDefault number of results to return. Default: 5
similarity_thresholdnumberMinimum similarity score (0–1). Default: 0.5
organization_idstringOptional org to associate with
ocr_enabledbooleanEnable OCR for image documents
ocr_languagesstringOCR language codes (e.g., "eng+nld")

Response

json
{
  "id": "rag-uuid-here",
  "name": "Product Documentation",
  "slug": "product-documentation",
  "description": "Knowledge base for all product docs",
  "embedding_model": "openai.small",
  "chunk_size": 512,
  "chunk_overlap": 50,
  "top_k": 5,
  "similarity_threshold": 0.3,
  "is_active": true,
  "created_at": "2026-03-15T10:30:00Z"
}

Upload Documents

POST/v1/rag/servers/:id/documentsAuth Required

Upload a document to your knowledge base. Text files are sent as plain text; binary files (PDFs, images) should be base64-encoded.

Text Document

Bash
curl -X POST https://api.llmhub.one/v1/rag/servers/{server_id}/documents \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMHUB_API_KEY" \
  -d '{
    "file_name": "getting-started.md",
    "file_type": "text/markdown",
    "content": "# Getting Started\n\nWelcome to our platform..."
  }'

PDF Document

Bash
# For binary files like PDFs, base64-encode the content
BASE64_CONTENT=$(base64 -w 0 report.pdf)

curl -X POST https://api.llmhub.one/v1/rag/servers/{server_id}/documents \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMHUB_API_KEY" \
  -d "{
    \"file_name\": \"report.pdf\",
    \"file_type\": \"application/pdf\",
    \"content\": \"$BASE64_CONTENT\"
  }"

Supported File Types

CategoryMIME Types
Documentsapplication/pdf, text/plain, text/markdown, text/html, text/csv, application/json
Codetext/x-go, text/x-python, text/javascript, text/x-typescript, application/x-yaml
Images (OCR)image/png, image/jpeg, image/gif, image/webp, image/bmp, image/tiff

Response

json
{
  "id": "doc-uuid-here",
  "rag_server_id": "rag-uuid-here",
  "file_name": "getting-started.md",
  "file_type": "text/markdown",
  "file_size": 2048,
  "status": "processed",
  "created_at": "2026-03-15T10:35:00Z"
}

Query Knowledge Base

POST/v1/rag/servers/:id/queryAuth Required

Search your knowledge base with a natural language query. Returns the most relevant document chunks ranked by similarity score.

Bash
curl -X POST https://api.llmhub.one/v1/rag/servers/{server_id}/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMHUB_API_KEY" \
  -d '{
    "query": "How do I reset my password?",
    "top_k": 3
  }'

Response

json
{
  "query": "How do I reset my password?",
  "results": [
    {
      "id": "chunk-uuid-1",
      "content": "## Password Reset\n\nTo reset your password, go to Settings > Security and click 'Reset Password'. You'll receive a confirmation email with a reset link.",
      "metadata": {
        "file_name": "account-settings.md",
        "chunk_index": 4
      },
      "score": 0.89
    },
    {
      "id": "chunk-uuid-2",
      "content": "## Account Recovery\n\nIf you've forgotten your password and can't access your email, contact support with your account ID for manual recovery.",
      "metadata": {
        "file_name": "faq.md",
        "chunk_index": 12
      },
      "score": 0.72
    }
  ],
  "count": 2
}

Score Interpretation

  • 0.8–1.0: Highly relevant — direct match
  • 0.5–0.8: Moderately relevant — related content
  • 0.3–0.5: Loosely related — may contain useful context
  • < 0.3: Filtered out by default similarity threshold

RAG + Chat Completions

The typical RAG pattern: query your knowledge base, then pass the retrieved context to a chat completion for grounded answers:

TypeScript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.llmhub.one/v1',
  apiKey: process.env.LLMHUB_API_KEY,
});

// 1. Query the knowledge base
const ragResponse = await fetch(
  'https://api.llmhub.one/v1/rag/servers/YOUR_SERVER_ID/query',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.LLMHUB_API_KEY}`,
    },
    body: JSON.stringify({
      query: userQuestion,
      top_k: 3,
    }),
  }
);

const { results } = await ragResponse.json();
const context = results.map(r => r.content).join('\n\n');

// 2. Use retrieved context in chat completion
const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    {
      role: 'system',
      content: `Answer the user's question using ONLY the following context.
If the answer is not in the context, say you don't know.

Context:
${context}`,
    },
    { role: 'user', content: userQuestion },
  ],
});

console.log(response.choices[0].message.content);

Server Statistics

GET/v1/rag/servers/:id/statsAuth Required
Bash
curl https://api.llmhub.one/v1/rag/servers/{server_id}/stats \
  -H "Authorization: Bearer $LLMHUB_API_KEY"

Response

json
{
  "document_count": 15,
  "chunk_count": 342,
  "total_bytes": 523776,
  "query_count": 1250,
  "last_sync_at": "2026-03-26T08:00:00Z"
}

Other Endpoints

MethodEndpointDescription
GET/v1/rag/serversList all your RAG servers
GET/v1/rag/servers/:idGet server details
PATCH/v1/rag/servers/:idUpdate server settings
DELETE/v1/rag/servers/:idDelete server and all documents
GET/v1/rag/servers/:id/documentsList documents in a server
DELETE/v1/rag/servers/:id/documents/:docIdDelete a single document
POST/v1/rag/servers/:id/reingestRe-process all documents
GET/v1/rag/embedding-modelsList available embedding models
GET/v1/rag/ocr-infoCheck OCR availability and languages