Knowledge Bases¶
TBD Agents supports knowledge sources and items for retrieval-augmented generation (RAG). Connect vector databases or MongoDB collections so agents have access to your domain knowledge at runtime.
Architecture Overview¶
Knowledge is organised into two layers:
- Knowledge Sources — connection definitions pointing at a data backend (Qdrant vector DB or MongoDB).
- Knowledge Items — individual pieces of knowledge stored within a source (text snippets, uploaded files, images).
When an agent executes, the engine calls build_knowledge_context() to aggregate relevant items into an XML block that is injected into the system prompt. Items are selected by matching tags on the agent or workflow against tags on items and sources.
Agent / Workflow tags ──▶ Knowledge Items (MongoDB)
──▶ Knowledge Sources (Qdrant) ──▶ Vector scroll results
The combined context is wrapped in <knowledge> XML and appended to the system prompt before sending to the LLM.
Knowledge Sources¶
A knowledge source represents a connection to a data backend:
| Type | Backend | Description |
|---|---|---|
vector_db |
Qdrant | Semantic search over vector-embedded documents |
mongo_db |
MongoDB | Structured storage with tag-based retrieval |
pgvector |
PostgreSQL + pgvector | Semantic search backed by a PostgreSQL database with the vector extension |
pgvector Sources¶
The pgvector source type lets you connect TBD Agents to any PostgreSQL 14+ database that has the pgvector extension installed. This is useful when your knowledge corpus already lives in Postgres, or when you prefer a single-database stack over a separate Qdrant service.
A pgvector source performs semantic similarity search when a query string is available and embeddings are enabled; it falls back to a recency-ordered scan when no query is provided.
connection_config fields for pgvector sources:
| Field | Required | Description |
|---|---|---|
dsn |
Yes (or dsn_token_name) |
asyncpg-compatible PostgreSQL connection string |
collection |
Yes | Table name suffix used to identify the target table |
dsn_token_name |
No | Name of a stored token whose value is the DSN — overrides dsn when set |
For setup instructions, Docker Compose usage, indexing options, and observability queries, see the PostgreSQL pgvector Backend guide.
Source Lifecycle¶
Each source has a status:
| Status | Meaning |
|---|---|
registered |
Source created but not yet tested |
connected |
Connection tested successfully |
error |
Last connection test failed (see last_error) |
Registering a source¶
curl -X POST http://localhost:8000/api/knowledge-sources \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "product-docs",
"source_type": "vector_db",
"connection_config": {"url": "http://qdrant:6333", "collection": "docs"},
"tags": ["documentation"]
}'
Qdrant Vector DB Setup¶
To use a Qdrant source you need a running Qdrant instance. Start one with the
bundled Compose profile by setting COMPOSE_PROFILES=qdrant in your .env,
then running docker compose up. See Choosing a Vector Store
for the full profile-selection pattern.
connection_config fields for vector_db:
| Field | Required | Description |
|---|---|---|
url |
Yes | Qdrant HTTP endpoint (e.g. http://qdrant:6333) |
collection |
Yes | Name of the Qdrant collection to query |
api_key_token_name |
No | Name of a stored token containing the Qdrant API key |
If your Qdrant instance requires authentication, first store the API key as a token:
# Store the API key
curl -X POST http://localhost:8000/api/tokens \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "qdrant-key", "value": "your-qdrant-api-key"}'
# Reference it in the source
curl -X POST http://localhost:8000/api/knowledge-sources \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "secure-vectors",
"source_type": "vector_db",
"connection_config": {
"url": "https://qdrant.example.com:6333",
"collection": "embeddings",
"api_key_token_name": "qdrant-key"
},
"tags": ["embeddings"]
}'
MongoDB Source¶
MongoDB sources use the same database as TBD Agents itself. No additional connection_config is needed — status always moves to connected on test.
curl -X POST http://localhost:8000/api/knowledge-sources \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "internal-docs",
"source_type": "mongo_db",
"tags": ["internal"]
}'
Testing a connection¶
curl -X POST http://localhost:8000/api/knowledge-sources/<ID>/test \
-H "Authorization: Bearer $GITHUB_TOKEN"
Returns {"success": true} on success or {"success": false, "error": "..."} with details.
Knowledge Items¶
Items represent individual pieces of knowledge stored within a source. There are three content types:
| Content Type | Storage | Description |
|---|---|---|
text |
MongoDB document | Plain text stored in text_content field |
file |
GridFS | Binary file (PDF, DOCX, etc.) stored via MongoDB GridFS |
image |
GridFS | Image file stored via MongoDB GridFS |
Creating a text item¶
curl -X POST http://localhost:8000/api/knowledge-items \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"source_id": "<SOURCE_ID>",
"name": "SLA summary",
"text_content": "Our SLA guarantees 99.9% uptime for production services.",
"tags": ["sla", "production"]
}'
Uploading a file (GridFS)¶
Binary files are stored in MongoDB GridFS, which handles arbitrarily large files by chunking them. The API accepts multipart form uploads:
curl -X POST http://localhost:8000/api/knowledge-items/upload \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-F "file=@runbook.pdf" \
-F "source_id=<SOURCE_ID>" \
-F "tags=runbook,ops"
The upload endpoint:
- Reads the file bytes from the multipart request
- Opens a GridFS upload stream with the original filename
- Writes the file content and closes the stream
- Creates a
KnowledgeItemrecord withfile_idpointing to the GridFS object
Uploaded files can be retrieved or deleted through the items API. Deleting an item also removes its associated GridFS file.
Querying items¶
# Query by tags
curl -X POST http://localhost:8000/api/knowledge-items/query \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{"tags": ["sla"]}'
# List all items for a source
curl "http://localhost:8000/api/knowledge-items?source_id=<SOURCE_ID>" \
-H "Authorization: Bearer $GITHUB_TOKEN"
Retrieval Behaviour¶
When an agent runs, the engine resolves knowledge context through two paths:
1. Tag-based retrieval (MongoDB items)¶
Items whose tags overlap with the agent or workflow tags are fetched directly from MongoDB. Up to 50 items are retrieved per execution. Each text item is wrapped in an XML <item> element:
<knowledge>
<item name="sla-doc" tags="sla,production">
Our SLA guarantees 99.9% uptime for production services.
</item>
</knowledge>
2. Semantic or scroll retrieval (Qdrant sources)¶
For each vector_db source attached to the agent, the engine calls Qdrant. When a query is available and EMBEDDINGS_ENABLED=true, Qdrant uses semantic query_points against the configured collection. If no query vector is available, embeddings are disabled, or semantic search fails, the engine falls back to scroll-style retrieval and reads document text from payload fields.
Each result's text payload is wrapped in an XML <item> element:
<knowledge>
<item source="product-docs">
Document text from the vector database...
</item>
</knowledge>
3. pgvector sources¶
pgvector knowledge sources call the pgvector query path. Per-source pgvector retrieval expects LangChain-style tables named langchain_pg_embedding_{collection} in the database referenced by the source connection_config.
Current Flutter UI¶
The Flutter Knowledge page exposes user-facing source kinds such as text, url, file, git, and database with fields for name, description, source type, and tags. These are UI-level source categories. API automation should still use the backend source_type values documented above: vector_db, mongo_db, and pgvector.
The current UI item form supports title/content-style text records. Multipart file upload and GridFS download remain available through the REST API even if they are not exposed in every dashboard flow.
Context injection¶
The aggregated <knowledge> block is appended to the system prompt before each LLM call. This ensures the model has access to relevant domain knowledge regardless of which provider (Copilot, Claude, or BYOK) is used.
Filtering¶
Knowledge items support filtering by:
| Parameter | Description |
|---|---|
source_id |
Items from a specific source |
tags |
Items matching specific tags |
content_type |
Filter by content type (text, file, image) |
Best Practices¶
- Use descriptive tags — agents select knowledge items by tag matching, so consistent tagging is critical.
- Keep text items focused — shorter, topic-specific items produce better results than large monolithic documents.
- Test connections after creation — always call the
/testendpoint to verify Qdrant connectivity. - Rotate API keys via tokens — store Qdrant API keys as TBD Agents tokens and reference them by name in
connection_config. This avoids embedding secrets in source definitions. - Monitor source status — sources in
errorstatus will be skipped during retrieval. Checklast_errorfor diagnostics.