Knowledge Bases¶
TBD Agents supports knowledge sources and items for retrieval-augmented generation (RAG). Connect vector databases or MongoDB collections so agents have access to your domain knowledge at runtime.
Architecture Overview¶
Knowledge is organised into two layers:
- Knowledge Sources — connection definitions pointing at a data backend (Qdrant vector DB or MongoDB).
- Knowledge Items — individual pieces of knowledge stored within a source (text snippets, uploaded files, images).
When an agent executes, the engine calls build_knowledge_context() to aggregate relevant items into an XML block that is injected into the system prompt. Items are selected by matching tags on the agent or workflow against tags on items and sources.
Agent / Workflow tags ──▶ Knowledge Items (MongoDB)
──▶ Knowledge Sources (Qdrant) ──▶ Vector scroll results
The combined context is wrapped in <knowledge> XML and appended to the system prompt before sending to the LLM.
Knowledge Sources¶
A knowledge source represents a connection to a data backend:
| Type | Backend | Description |
|---|---|---|
vector_db |
Qdrant | Semantic search over vector-embedded documents |
mongo_db |
MongoDB | Structured storage with tag-based retrieval |
Source Lifecycle¶
Each source has a status:
| Status | Meaning |
|---|---|
registered |
Source created but not yet tested |
connected |
Connection tested successfully |
error |
Last connection test failed (see last_error) |
Registering a source¶
curl -X POST http://localhost:8000/api/knowledge-sources \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "product-docs",
"type": "vector_db",
"connection_config": {"url": "http://qdrant:6333", "collection": "docs"},
"tags": ["documentation"]
}'
Qdrant Vector DB Setup¶
To use a Qdrant source you need a running Qdrant instance. The docker-compose.yml includes Qdrant by default.
connection_config fields for vector_db:
| Field | Required | Description |
|---|---|---|
url |
Yes | Qdrant HTTP endpoint (e.g. http://qdrant:6333) |
collection |
Yes | Name of the Qdrant collection to query |
api_key_token_name |
No | Name of a stored token containing the Qdrant API key |
If your Qdrant instance requires authentication, first store the API key as a token:
# Store the API key
curl -X POST http://localhost:8000/api/tokens \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "qdrant-key", "value": "your-qdrant-api-key"}'
# Reference it in the source
curl -X POST http://localhost:8000/api/knowledge-sources \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "secure-vectors",
"type": "vector_db",
"connection_config": {
"url": "https://qdrant.example.com:6333",
"collection": "embeddings",
"api_key_token_name": "qdrant-key"
},
"tags": ["embeddings"]
}'
MongoDB Source¶
MongoDB sources use the same database as TBD Agents itself. No additional connection_config is needed — status always moves to connected on test.
curl -X POST http://localhost:8000/api/knowledge-sources \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "internal-docs",
"type": "mongo_db",
"tags": ["internal"]
}'
Testing a connection¶
curl -X POST http://localhost:8000/api/knowledge-sources/<ID>/test \
-H "Authorization: Bearer $GITHUB_TOKEN"
Returns {"success": true} on success or {"success": false, "error": "..."} with details.
Knowledge Items¶
Items represent individual pieces of knowledge stored within a source. There are three content types:
| Content Type | Storage | Description |
|---|---|---|
text |
MongoDB document | Plain text stored in text_content field |
file |
GridFS | Binary file (PDF, DOCX, etc.) stored via MongoDB GridFS |
image |
GridFS | Image file stored via MongoDB GridFS |
Creating a text item¶
curl -X POST http://localhost:8000/api/knowledge-items \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"source_id": "<SOURCE_ID>",
"content": "Our SLA guarantees 99.9% uptime for production services.",
"tags": ["sla", "production"]
}'
Uploading a file (GridFS)¶
Binary files are stored in MongoDB GridFS, which handles arbitrarily large files by chunking them. The API accepts multipart form uploads:
curl -X POST http://localhost:8000/api/knowledge-items/upload \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-F "file=@runbook.pdf" \
-F "source_id=<SOURCE_ID>" \
-F "tags=runbook,ops"
The upload endpoint:
- Reads the file bytes from the multipart request
- Opens a GridFS upload stream with the original filename
- Writes the file content and closes the stream
- Creates a
KnowledgeItemrecord withfile_idpointing to the GridFS object
Uploaded files can be retrieved or deleted through the items API. Deleting an item also removes its associated GridFS file.
Querying items¶
# Query by tags
curl -X POST http://localhost:8000/api/knowledge-items/query \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{"tags": ["sla"]}'
# List all items for a source
curl "http://localhost:8000/api/knowledge-items?source_id=<SOURCE_ID>" \
-H "Authorization: Bearer $GITHUB_TOKEN"
Retrieval Behaviour¶
When an agent runs, the engine resolves knowledge context through two paths:
1. Tag-based retrieval (MongoDB items)¶
Items whose tags overlap with the agent or workflow tags are fetched directly from MongoDB. Up to 50 items are retrieved per execution. Each text item is wrapped in an XML <item> element:
<knowledge>
<item name="sla-doc" tags="sla,production">
Our SLA guarantees 99.9% uptime for production services.
</item>
</knowledge>
2. Vector scroll (Qdrant sources)¶
For each vector_db source attached to the agent, the engine calls query_vector_db() which uses Qdrant's scroll API to retrieve up to 20 documents with their payloads. Each result's text payload field is wrapped in an XML <item> element:
<knowledge>
<item source="product-docs">
Document text from the vector database...
</item>
</knowledge>
Phase 1 Limitation
The current implementation uses Qdrant scroll (not semantic search). Full embedding-based search with query vectors will be added in a future release.
Context injection¶
The aggregated <knowledge> block is appended to the system prompt before each LLM call. This ensures the model has access to relevant domain knowledge regardless of which provider (Copilot, Claude, or BYOK) is used.
Filtering¶
Knowledge items support filtering by:
| Parameter | Description |
|---|---|
source_id |
Items from a specific source |
tags |
Items matching specific tags |
content_type |
Filter by content type (text, file, image) |
Best Practices¶
- Use descriptive tags — agents select knowledge items by tag matching, so consistent tagging is critical.
- Keep text items focused — shorter, topic-specific items produce better results than large monolithic documents.
- Test connections after creation — always call the
/testendpoint to verify Qdrant connectivity. - Rotate API keys via tokens — store Qdrant API keys as TBD Agents tokens and reference them by name in
connection_config. This avoids embedding secrets in source definitions. - Monitor source status — sources in
errorstatus will be skipped during retrieval. Checklast_errorfor diagnostics.