Knowledge Base (RAG)

Create and manage RAG (Retrieval-Augmented Generation) instances that let your AI agents answer questions using your own documents, PDFs, and knowledge base.

How It Works

Create a RAG server

Each server is an isolated knowledge base with its own documents and embeddings.

Upload documents

Upload PDFs, Markdown, HTML, plain text, code files, or images (with OCR). Documents are automatically split into chunks and embedded.

Query for relevant context

Search your knowledge base with natural language. The system returns the most relevant chunks ranked by similarity score.

Use context in prompts

Pass retrieved chunks as context to your chat completions for grounded, accurate answers.

Create RAG Server

POST/v1/rag/serversAuth Required

Bash

curl -X POST https://api.llmhub.one/v1/rag/servers \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMHUB_API_KEY" \
  -d '{
    "name": "Product Documentation",
    "description": "Knowledge base for all product docs",
    "embedding_model": "openai.small",
    "chunk_size": 512,
    "chunk_overlap": 50,
    "top_k": 5,
    "similarity_threshold": 0.3
  }'

Parameters

Parameter	Type	Description
`name`Required	string	Display name (2–255 chars)
`description`	string	Optional description
`embedding_model`	string	Embedding model to use. Options: `openai.small` (1536d), `openai.large` (3072d)
`chunk_size`	integer	Characters per chunk. Default: 512
`chunk_overlap`	integer	Overlap between chunks. Default: 50
`top_k`	integer	Default number of results to return. Default: 5
`similarity_threshold`	number	Minimum similarity score (0–1). Default: 0.5
`organization_id`	string	Optional org to associate with
`ocr_enabled`	boolean	Enable OCR for image documents
`ocr_languages`	string	OCR language codes (e.g., "eng+nld")

Response

json

{
  "id": "rag-uuid-here",
  "name": "Product Documentation",
  "slug": "product-documentation",
  "description": "Knowledge base for all product docs",
  "embedding_model": "openai.small",
  "chunk_size": 512,
  "chunk_overlap": 50,
  "top_k": 5,
  "similarity_threshold": 0.3,
  "is_active": true,
  "created_at": "2026-03-15T10:30:00Z"
}

Upload Documents

POST/v1/rag/servers/:id/documentsAuth Required

Upload a document to your knowledge base. Text files are sent as plain text; binary files (PDFs, images) should be base64-encoded.

Text Document

Bash

curl -X POST https://api.llmhub.one/v1/rag/servers/{server_id}/documents \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMHUB_API_KEY" \
  -d '{
    "file_name": "getting-started.md",
    "file_type": "text/markdown",
    "content": "# Getting Started\n\nWelcome to our platform..."
  }'

PDF Document

Bash

# For binary files like PDFs, base64-encode the content
BASE64_CONTENT=$(base64 -w 0 report.pdf)

curl -X POST https://api.llmhub.one/v1/rag/servers/{server_id}/documents \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMHUB_API_KEY" \
  -d "{
    \"file_name\": \"report.pdf\",
    \"file_type\": \"application/pdf\",
    \"content\": \"$BASE64_CONTENT\"
  }"

Supported File Types

Category	MIME Types
Documents	`application/pdf`, `text/plain`, `text/markdown`, `text/html`, `text/csv`, `application/json`
Code	`text/x-go`, `text/x-python`, `text/javascript`, `text/x-typescript`, `application/x-yaml`
Images (OCR)	`image/png`, `image/jpeg`, `image/gif`, `image/webp`, `image/bmp`, `image/tiff`

Response

json

{
  "id": "doc-uuid-here",
  "rag_server_id": "rag-uuid-here",
  "file_name": "getting-started.md",
  "file_type": "text/markdown",
  "file_size": 2048,
  "status": "processed",
  "created_at": "2026-03-15T10:35:00Z"
}

Query Knowledge Base

POST/v1/rag/servers/:id/queryAuth Required

Search your knowledge base with a natural language query. Returns the most relevant document chunks ranked by similarity score.

Bash

curl -X POST https://api.llmhub.one/v1/rag/servers/{server_id}/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $LLMHUB_API_KEY" \
  -d '{
    "query": "How do I reset my password?",
    "top_k": 3
  }'

Response

json

{
  "query": "How do I reset my password?",
  "results": [
    {
      "id": "chunk-uuid-1",
      "content": "## Password Reset\n\nTo reset your password, go to Settings > Security and click 'Reset Password'. You'll receive a confirmation email with a reset link.",
      "metadata": {
        "file_name": "account-settings.md",
        "chunk_index": 4
      },
      "score": 0.89
    },
    {
      "id": "chunk-uuid-2",
      "content": "## Account Recovery\n\nIf you've forgotten your password and can't access your email, contact support with your account ID for manual recovery.",
      "metadata": {
        "file_name": "faq.md",
        "chunk_index": 12
      },
      "score": 0.72
    }
  ],
  "count": 2
}

Score Interpretation

0.8–1.0: Highly relevant — direct match
0.5–0.8: Moderately relevant — related content
0.3–0.5: Loosely related — may contain useful context
< 0.3: Filtered out by default similarity threshold

RAG + Chat Completions

The typical RAG pattern: query your knowledge base, then pass the retrieved context to a chat completion for grounded answers:

TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.llmhub.one/v1',
  apiKey: process.env.LLMHUB_API_KEY,
});

// 1. Query the knowledge base
const ragResponse = await fetch(
  'https://api.llmhub.one/v1/rag/servers/YOUR_SERVER_ID/query',
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.LLMHUB_API_KEY}`,
    },
    body: JSON.stringify({
      query: userQuestion,
      top_k: 3,
    }),
  }
);

const { results } = await ragResponse.json();
const context = results.map(r => r.content).join('\n\n');

// 2. Use retrieved context in chat completion
const response = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [
    {
      role: 'system',
      content: `Answer the user's question using ONLY the following context.
If the answer is not in the context, say you don't know.

Context:
${context}`,
    },
    { role: 'user', content: userQuestion },
  ],
});

console.log(response.choices[0].message.content);

Server Statistics

GET/v1/rag/servers/:id/statsAuth Required

Bash

curl https://api.llmhub.one/v1/rag/servers/{server_id}/stats \
  -H "Authorization: Bearer $LLMHUB_API_KEY"

Response

json

{
  "document_count": 15,
  "chunk_count": 342,
  "total_bytes": 523776,
  "query_count": 1250,
  "last_sync_at": "2026-03-26T08:00:00Z"
}

Other Endpoints

Method	Endpoint	Description
GET	`/v1/rag/servers`	List all your RAG servers
GET	`/v1/rag/servers/:id`	Get server details
PATCH	`/v1/rag/servers/:id`	Update server settings
DELETE	`/v1/rag/servers/:id`	Delete server and all documents
GET	`/v1/rag/servers/:id/documents`	List documents in a server
DELETE	`/v1/rag/servers/:id/documents/:docId`	Delete a single document
POST	`/v1/rag/servers/:id/reingest`	Re-process all documents
GET	`/v1/rag/embedding-models`	List available embedding models
GET	`/v1/rag/ocr-info`	Check OCR availability and languages

← SABINE Organizations →