Knowledge Base (RAG)
Create and manage RAG (Retrieval-Augmented Generation) instances that let your AI agents answer questions using your own documents, PDFs, and knowledge base.
How It Works
Create a RAG server
Each server is an isolated knowledge base with its own documents and embeddings.
Upload documents
Upload PDFs, Markdown, HTML, plain text, code files, or images (with OCR). Documents are automatically split into chunks and embedded.
Query for relevant context
Search your knowledge base with natural language. The system returns the most relevant chunks ranked by similarity score.
Use context in prompts
Pass retrieved chunks as context to your chat completions for grounded, accurate answers.
Create RAG Server
/v1/rag/serversAuth Requiredcurl -X POST https://api.llmhub.one/v1/rag/servers \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LLMHUB_API_KEY" \
-d '{
"name": "Product Documentation",
"description": "Knowledge base for all product docs",
"embedding_model": "openai.small",
"chunk_size": 512,
"chunk_overlap": 50,
"top_k": 5,
"similarity_threshold": 0.3
}'Parameters
| Parameter | Type | Description |
|---|---|---|
nameRequired | string | Display name (2–255 chars) |
description | string | Optional description |
embedding_model | string | Embedding model to use. Options: openai.small (1536d), openai.large (3072d) |
chunk_size | integer | Characters per chunk. Default: 512 |
chunk_overlap | integer | Overlap between chunks. Default: 50 |
top_k | integer | Default number of results to return. Default: 5 |
similarity_threshold | number | Minimum similarity score (0–1). Default: 0.5 |
organization_id | string | Optional org to associate with |
ocr_enabled | boolean | Enable OCR for image documents |
ocr_languages | string | OCR language codes (e.g., "eng+nld") |
Response
{
"id": "rag-uuid-here",
"name": "Product Documentation",
"slug": "product-documentation",
"description": "Knowledge base for all product docs",
"embedding_model": "openai.small",
"chunk_size": 512,
"chunk_overlap": 50,
"top_k": 5,
"similarity_threshold": 0.3,
"is_active": true,
"created_at": "2026-03-15T10:30:00Z"
}Upload Documents
/v1/rag/servers/:id/documentsAuth RequiredUpload a document to your knowledge base. Text files are sent as plain text; binary files (PDFs, images) should be base64-encoded.
Text Document
curl -X POST https://api.llmhub.one/v1/rag/servers/{server_id}/documents \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LLMHUB_API_KEY" \
-d '{
"file_name": "getting-started.md",
"file_type": "text/markdown",
"content": "# Getting Started\n\nWelcome to our platform..."
}'PDF Document
# For binary files like PDFs, base64-encode the content
BASE64_CONTENT=$(base64 -w 0 report.pdf)
curl -X POST https://api.llmhub.one/v1/rag/servers/{server_id}/documents \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LLMHUB_API_KEY" \
-d "{
\"file_name\": \"report.pdf\",
\"file_type\": \"application/pdf\",
\"content\": \"$BASE64_CONTENT\"
}"Supported File Types
| Category | MIME Types |
|---|---|
| Documents | application/pdf, text/plain, text/markdown, text/html, text/csv, application/json |
| Code | text/x-go, text/x-python, text/javascript, text/x-typescript, application/x-yaml |
| Images (OCR) | image/png, image/jpeg, image/gif, image/webp, image/bmp, image/tiff |
Response
{
"id": "doc-uuid-here",
"rag_server_id": "rag-uuid-here",
"file_name": "getting-started.md",
"file_type": "text/markdown",
"file_size": 2048,
"status": "processed",
"created_at": "2026-03-15T10:35:00Z"
}Query Knowledge Base
/v1/rag/servers/:id/queryAuth RequiredSearch your knowledge base with a natural language query. Returns the most relevant document chunks ranked by similarity score.
curl -X POST https://api.llmhub.one/v1/rag/servers/{server_id}/query \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LLMHUB_API_KEY" \
-d '{
"query": "How do I reset my password?",
"top_k": 3
}'Response
{
"query": "How do I reset my password?",
"results": [
{
"id": "chunk-uuid-1",
"content": "## Password Reset\n\nTo reset your password, go to Settings > Security and click 'Reset Password'. You'll receive a confirmation email with a reset link.",
"metadata": {
"file_name": "account-settings.md",
"chunk_index": 4
},
"score": 0.89
},
{
"id": "chunk-uuid-2",
"content": "## Account Recovery\n\nIf you've forgotten your password and can't access your email, contact support with your account ID for manual recovery.",
"metadata": {
"file_name": "faq.md",
"chunk_index": 12
},
"score": 0.72
}
],
"count": 2
}Score Interpretation
- 0.8–1.0: Highly relevant — direct match
- 0.5–0.8: Moderately relevant — related content
- 0.3–0.5: Loosely related — may contain useful context
- < 0.3: Filtered out by default similarity threshold
RAG + Chat Completions
The typical RAG pattern: query your knowledge base, then pass the retrieved context to a chat completion for grounded answers:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.llmhub.one/v1',
apiKey: process.env.LLMHUB_API_KEY,
});
// 1. Query the knowledge base
const ragResponse = await fetch(
'https://api.llmhub.one/v1/rag/servers/YOUR_SERVER_ID/query',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.LLMHUB_API_KEY}`,
},
body: JSON.stringify({
query: userQuestion,
top_k: 3,
}),
}
);
const { results } = await ragResponse.json();
const context = results.map(r => r.content).join('\n\n');
// 2. Use retrieved context in chat completion
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `Answer the user's question using ONLY the following context.
If the answer is not in the context, say you don't know.
Context:
${context}`,
},
{ role: 'user', content: userQuestion },
],
});
console.log(response.choices[0].message.content);Server Statistics
/v1/rag/servers/:id/statsAuth Requiredcurl https://api.llmhub.one/v1/rag/servers/{server_id}/stats \
-H "Authorization: Bearer $LLMHUB_API_KEY"Response
{
"document_count": 15,
"chunk_count": 342,
"total_bytes": 523776,
"query_count": 1250,
"last_sync_at": "2026-03-26T08:00:00Z"
}Other Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /v1/rag/servers | List all your RAG servers |
| GET | /v1/rag/servers/:id | Get server details |
| PATCH | /v1/rag/servers/:id | Update server settings |
| DELETE | /v1/rag/servers/:id | Delete server and all documents |
| GET | /v1/rag/servers/:id/documents | List documents in a server |
| DELETE | /v1/rag/servers/:id/documents/:docId | Delete a single document |
| POST | /v1/rag/servers/:id/reingest | Re-process all documents |
| GET | /v1/rag/embedding-models | List available embedding models |
| GET | /v1/rag/ocr-info | Check OCR availability and languages |

