Agent Chat¶

Talk to your agents conversationally — ask questions about their configuration, capabilities, and recent work without triggering any task execution.

Overview¶

The Agent Chat feature gives every agent a real-time, multi-turn chat interface. Unlike the workflow prompt system (which dispatches Celery tasks and runs tools), chat is a pure conversational interface:

✅ Answers questions about what the agent can do
✅ Describes its skills, tools, and memory
✅ Summarizes its recent task history
✅ Streams responses token-by-token via SSE
❌ Does not execute tools or trigger workflows

Current Flutter UI

The Chat page currently exposes an agent selector, streaming messages for the active in-memory conversation, and a clear-conversation action. Session list, history inspection, and deletion are available through the API endpoints below.

What can I ask?¶

Any question about the agent's context — for example:

Question	What the agent draws on
"What have you been working on?"	Recent task history
"What tools do you have access to?"	MCP server tools + built-in tools
"What skills are you configured with?"	Installed skill documents
"What do you remember about the last deployment?"	STM + LTM memories
"What model are you using?"	Agent profile / config

How it works¶

Client ──POST──► /api/agents/{id}/chat ──► ChatHandler (in-process)
  │                                              │
  │◄──── SSE token stream ◄─────────────────────┘
                                                 │
           ┌──────────────────────────┐          │
           │    Context Assembly      │◄─────────┘
           │  ① Agent profile         │
           │  ② Skills                │
           │  ③ Available tools       │
           │  ④ Recent task history   │
           │  ⑤ STM + LTM memories    │
           └──────────────────────────┘

On each message:

A ChatSession is created (or reused if you pass a session_id).
The agent's self-awareness context is assembled from its configuration.
Conversation history is loaded from MongoDB (last 50 messages).
The LLM is called directly from the FastAPI process — no Celery overhead.
Response tokens stream back via SSE.
User and assistant messages are persisted to MongoDB.

Quick start¶

Start a new conversation¶

curl -N -X POST http://localhost:8000/api/agents/AGENT_ID/chat \
  -H "Authorization: Bearer ghp_..." \
  -H "Content-Type: application/json" \
  -d '{"message": "What can you do?"}'

You will receive an SSE stream:

id: 1
data: {"type": "session", "session_id": "66abc123..."}

id: 2
data: {"type": "delta", "content": "I am a deployment assistant"}

id: 3
data: {"type": "delta", "content": " configured with the following skills…"}

id: N
data: {"type": "done", "usage": {"prompt_tokens": 480, "completion_tokens": 95}, "message_id": "66def..."}

Continue a conversation¶

curl -N -X POST http://localhost:8000/api/agents/AGENT_ID/chat \
  -H "Authorization: Bearer ghp_..." \
  -H "Content-Type: application/json" \
  -d '{"message": "What was the last task you completed?", "session_id": "66abc123..."}'

List your sessions¶

curl http://localhost:8000/api/agents/AGENT_ID/chat/sessions \
  -H "Authorization: Bearer ghp_..."

View a session with full history¶

curl http://localhost:8000/api/agents/AGENT_ID/chat/sessions/SESSION_ID \
  -H "Authorization: Bearer ghp_..."

Delete a session¶

curl -X DELETE http://localhost:8000/api/agents/AGENT_ID/chat/sessions/SESSION_ID \
  -H "Authorization: Bearer ghp_..."

Conversation context¶

The agent is given an <agent_context> block at the start of every chat containing:

<agent_context>
  <agent_profile>
    <name>Deploy Assistant</name>
    <model>gpt-4o</model>
    <description>Handles deployment workflows</description>
  </agent_profile>

  <skills>
    <skill>
      <name>Kubernetes Deployments</name>
      <description>Manages k8s deployments and rollbacks</description>
    </skill>
  </skills>

  <available_tools>
    <tool>kubectl_apply</tool>
    <tool>git_push</tool>
    <tool>slack_notify</tool>
  </available_tools>

  <task_history>
    <task>
      <prompt>Deploy v2.3.0 to staging</prompt>
      <status>completed</status>
      <created_at>2026-04-17T10:00:00Z</created_at>
    </task>
  </task_history>

  <!-- STM/LTM memories from Redis + MongoDB -->
</agent_context>

Session lifecycle¶

New message without session_id
        │
        ▼
   Create ChatSession
        │
        ▼
   handle_chat()
        │
        ├─► yield session event
        ├─► stream LLM response (delta events)
        └─► yield done event
                │
                ▼
        Persist user + assistant ChatMessages
        Update ChatSession (count, title, updated_at)

Sessions are automatically titled from the first message (truncated to 60 characters).

Configuration¶

Chat inherits the agent's attached BYOK provider if configured. If no provider is attached, the GitHub Models inference endpoint is used with the user's GitHub token.

Setting	Default	Description
Conversation window	50 messages	Older messages are dropped
Response timeout	120 s	Per-response timeout
Session title	First 60 chars	Auto-generated from first message
Task history	Last 10 tasks	Summarised in context

Observability¶

Two new Prometheus metrics track chat activity:

Metric	Labels	Description
`copilot_hub_chat_messages_total`	`role`	Messages processed (user / assistant)
`copilot_hub_chat_response_duration_seconds`	`model`	LLM response time histogram

The existing copilot_hub_sse_connections_active gauge covers active chat SSE connections.