Agent Chat¶
Talk to your agents conversationally — ask questions about their configuration, capabilities, and recent work without triggering any task execution.
Overview¶
The Agent Chat feature gives every agent a real-time, multi-turn chat interface. Unlike the workflow prompt system (which dispatches Celery tasks and runs tools), chat is a pure conversational interface:
- ✅ Answers questions about what the agent can do
- ✅ Describes its skills, tools, and memory
- ✅ Summarizes its recent task history
- ✅ Streams responses token-by-token via SSE
- ❌ Does not execute tools or trigger workflows
Current Flutter UI
The Chat page currently exposes an agent selector, streaming messages for the active in-memory conversation, and a clear-conversation action. Session list, history inspection, and deletion are available through the API endpoints below.
What can I ask?¶
Any question about the agent's context — for example:
| Question | What the agent draws on |
|---|---|
| "What have you been working on?" | Recent task history |
| "What tools do you have access to?" | MCP server tools + built-in tools |
| "What skills are you configured with?" | Installed skill documents |
| "What do you remember about the last deployment?" | STM + LTM memories |
| "What model are you using?" | Agent profile / config |
How it works¶
Client ──POST──► /api/agents/{id}/chat ──► ChatHandler (in-process)
│ │
│◄──── SSE token stream ◄─────────────────────┘
│
┌──────────────────────────┐ │
│ Context Assembly │◄─────────┘
│ ① Agent profile │
│ ② Skills │
│ ③ Available tools │
│ ④ Recent task history │
│ ⑤ STM + LTM memories │
└──────────────────────────┘
On each message:
- A
ChatSessionis created (or reused if you pass asession_id). - The agent's self-awareness context is assembled from its configuration.
- Conversation history is loaded from MongoDB (last 50 messages).
- The LLM is called directly from the FastAPI process — no Celery overhead.
- Response tokens stream back via SSE.
- User and assistant messages are persisted to MongoDB.
Quick start¶
Start a new conversation¶
curl -N -X POST http://localhost:8000/api/agents/AGENT_ID/chat \
-H "Authorization: Bearer ghp_..." \
-H "Content-Type: application/json" \
-d '{"message": "What can you do?"}'
You will receive an SSE stream:
id: 1
data: {"type": "session", "session_id": "66abc123..."}
id: 2
data: {"type": "delta", "content": "I am a deployment assistant"}
id: 3
data: {"type": "delta", "content": " configured with the following skills…"}
id: N
data: {"type": "done", "usage": {"prompt_tokens": 480, "completion_tokens": 95}, "message_id": "66def..."}
Continue a conversation¶
curl -N -X POST http://localhost:8000/api/agents/AGENT_ID/chat \
-H "Authorization: Bearer ghp_..." \
-H "Content-Type: application/json" \
-d '{"message": "What was the last task you completed?", "session_id": "66abc123..."}'
List your sessions¶
View a session with full history¶
curl http://localhost:8000/api/agents/AGENT_ID/chat/sessions/SESSION_ID \
-H "Authorization: Bearer ghp_..."
Delete a session¶
curl -X DELETE http://localhost:8000/api/agents/AGENT_ID/chat/sessions/SESSION_ID \
-H "Authorization: Bearer ghp_..."
Conversation context¶
The agent is given an <agent_context> block at the start of every chat containing:
<agent_context>
<agent_profile>
<name>Deploy Assistant</name>
<model>gpt-4o</model>
<description>Handles deployment workflows</description>
</agent_profile>
<skills>
<skill>
<name>Kubernetes Deployments</name>
<description>Manages k8s deployments and rollbacks</description>
</skill>
</skills>
<available_tools>
<tool>kubectl_apply</tool>
<tool>git_push</tool>
<tool>slack_notify</tool>
</available_tools>
<task_history>
<task>
<prompt>Deploy v2.3.0 to staging</prompt>
<status>completed</status>
<created_at>2026-04-17T10:00:00Z</created_at>
</task>
</task_history>
<!-- STM/LTM memories from Redis + MongoDB -->
</agent_context>
Session lifecycle¶
New message without session_id
│
▼
Create ChatSession
│
▼
handle_chat()
│
├─► yield session event
├─► stream LLM response (delta events)
└─► yield done event
│
▼
Persist user + assistant ChatMessages
Update ChatSession (count, title, updated_at)
Sessions are automatically titled from the first message (truncated to 60 characters).
Configuration¶
Chat inherits the agent's attached BYOK provider if configured. If no provider is attached, the GitHub Models inference endpoint is used with the user's GitHub token.
| Setting | Default | Description |
|---|---|---|
| Conversation window | 50 messages | Older messages are dropped |
| Response timeout | 120 s | Per-response timeout |
| Session title | First 60 chars | Auto-generated from first message |
| Task history | Last 10 tasks | Summarised in context |
Observability¶
Two new Prometheus metrics track chat activity:
| Metric | Labels | Description |
|---|---|---|
copilot_hub_chat_messages_total |
role |
Messages processed (user / assistant) |
copilot_hub_chat_response_duration_seconds |
model |
LLM response time histogram |
The existing copilot_hub_sse_connections_active gauge covers active chat SSE connections.