BYOK Providers¶
Bring Your Own Key (BYOK) lets you run agents against external LLM providers — OpenAI, Azure OpenAI, Anthropic, or a custom endpoint — using your own API keys. TBD Agents handles streaming, retries, context management, and tool orchestration identically to the built-in Copilot SDK path.
Supported Provider Types¶
| Type | Description |
|---|---|
openai |
OpenAI API (api.openai.com) or any compatible proxy |
azure_openai |
Azure OpenAI Service with deployment-based routing |
anthropic |
Anthropic Claude via the Claude Agent SDK (beta.agents/sessions) |
github_copilot |
Overrides the default GitHub token with a stored PAT |
custom |
Any OpenAI-compatible endpoint — set base_url |
Quick Setup¶
1. Store your API key¶
curl -X POST http://localhost:8000/api/tokens \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "openai-key", "value": "sk-..."}'
2. Register the provider¶
curl -X POST http://localhost:8000/api/providers \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "my-openai",
"provider_type": "openai",
"api_key_token_name": "openai-key"
}'
3. Attach to an agent¶
curl -X PUT http://localhost:8000/api/agents/{agent_id} \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "Content-Type: application/json" \
-d '{"provider_id": "<provider-id>"}'
That's it — workflows created with this agent now route to OpenAI.
Azure OpenAI¶
Azure deployments require a base_url and either an explicit azure_deployment or the workflow's model name:
{
"name": "azure-gpt4o",
"provider_type": "azure_openai",
"api_key_token_name": "azure-key",
"base_url": "https://myresource.openai.azure.com",
"azure_deployment": "gpt-4o",
"azure_api_version": "2024-12-01-preview"
}
The engine constructs the correct URL:
If azure_deployment is not set, the workflow's model field is used as the deployment name.
Features¶
Streaming¶
OpenAI, Azure OpenAI, and custom OpenAI-compatible providers stream responses via SSE. Anthropic providers stream via Claude Agent SDK events. In both cases, content deltas are published in real-time to the same event bus used by the Copilot SDK path — clients receive message_delta events identical to those from the built-in path.
Retry & Error Handling¶
Transient errors are retried automatically with exponential backoff:
- Retryable status codes: 429, 500, 502, 503, 504
- Retryable exceptions: connection errors, read/write timeouts
- Max retries: 3 retries / 4 total attempts, with exponential backoff between retries (1s, 2s, 4s)
- Retry-After: Honoured when the provider sends the header
Non-retryable errors (e.g. 401, 403) fail immediately.
Context Compaction¶
When accumulated input tokens exceed 80% of the 128k context window, the engine automatically compacts the conversation:
- Keeps the system prompt and original user message
- Drops intermediate tool call/result exchanges
- Inserts a compaction marker for the model
- Retains the last 6 messages for continuity
This prevents context overflow during long agentic loops.
Usage Tracking¶
Every BYOK execution records:
| Metric | Source |
|---|---|
prompt_tokens |
usage.prompt_tokens |
completion_tokens |
usage.completion_tokens |
cached_tokens |
usage.prompt_tokens_details.cached_tokens |
cost |
usage.cost (if provider returns it) |
All metrics are exposed as Prometheus histograms (cache_read, cache_write, cost_dollars_total, tool_calls_per_task).
Progress Tracking¶
When an agent calls manage_todo_list, the engine parses the todo items and updates the task execution's progress — the same behaviour as the Copilot SDK path.
Tool Calling¶
BYOK providers use the OpenAI function-calling format. All MCP tools configured on the agent are converted to OpenAI tool definitions and passed to the model. The engine loops:
- Send messages + tools to the provider
- If the model returns
tool_calls, execute each via MCP - Append results and repeat
- When the model responds without tool calls (or hits
max_turns), return the final answer
Comparison with Copilot SDK Path¶
| Feature | Copilot SDK | BYOK Custom |
|---|---|---|
| Streaming | ✅ SDK events | ✅ SSE chunks |
| Tool calling | ✅ SDK-managed | ✅ OpenAI function-calling loop |
| Context management | ✅ SDK-managed | ✅ Auto-compaction at 80% |
| Retry logic | ✅ SDK-managed | ✅ Exponential backoff |
| Usage tracking | ✅ SDK events | ✅ Response usage fields |
| Progress tracking | ✅ SDK events | ✅ manage_todo_list parsing |
| Azure support | ❌ | ✅ Deployment-based routing |
| Model freedom | GitHub models | Any OpenAI-compatible |