Scaling¶

TBD Agents is designed for horizontal scaling from day one. All components are stateless (except the databases), so you can add capacity by running more instances.

Scaling Strategy¶

graph TB
    subgraph Load Balancer
        LB[Reverse Proxy / Ingress]
    end

    subgraph API Instances
        API1[FastAPI #1]
        API2[FastAPI #2]
        API3[FastAPI #N]
    end

    subgraph Worker Pool
        W1[Worker #1<br/>concurrency=4]
        W2[Worker #2<br/>concurrency=4]
        W3[Worker #N<br/>concurrency=4]
    end

    subgraph Infrastructure
        Redis[(Redis Cluster)]
        Mongo[(MongoDB Replica Set)]
    end

    LB --> API1 & API2 & API3
    API1 & API2 & API3 --> Redis
    API1 & API2 & API3 --> Mongo
    Redis --> W1 & W2 & W3
    W1 & W2 & W3 --> Mongo
    W1 & W2 & W3 --> Redis

Horizontal Worker Scaling¶

Workers are stateless — they load everything from MongoDB and communicate via Redis. Add more containers to handle more concurrent agent runs.

# Docker Compose — run 5 worker containers
docker-compose up --build --scale worker=5

# Each worker runs --concurrency=4
# Total = 20 concurrent agent executions

Horizontal API Scaling¶

The FastAPI app service is also stateless. Run multiple instances behind a load balancer:

docker-compose up --build --scale app=3

SSE connections are per-client, and each API instance independently subscribes to Redis pub/sub for the relevant workflow channel.

Infrastructure Scaling¶

Component	Strategy
Redis	Redis Sentinel or Redis Cluster for high availability
MongoDB	Replica sets or MongoDB Atlas
Workers	Increase `--concurrency` per container or add containers
API	Multiple instances behind a reverse proxy

Kubernetes / Helm¶

TBD Agents includes Helm charts for Kubernetes deployment with:

API HPA — Horizontal Pod Autoscaler for the FastAPI service
KEDA ScaledObject — Autoscale workers based on Redis queue length
PVC — Persistent volume claims for data
Ingress — Configurable ingress for external access

graph LR
    Ingress[Ingress Controller] --> APISvc[API Service]
    APISvc --> APIPods[API Pods<br/>HPA: 2-10]
    KEDA[KEDA] -->|Scale on queue depth| WorkerPods[Worker Pods<br/>ScaledObject: 1-20]
    APIPods --> Redis[(Redis)]
    APIPods --> Mongo[(MongoDB)]
    WorkerPods --> Redis
    WorkerPods --> Mongo

Capacity Planning¶

Metric	Guidance
Concurrent agents	1 agent ≈ 1 Celery task slot. `workers × concurrency` = max concurrent
Memory per worker	~256MB base + SDK overhead per concurrent task
Redis memory	Minimal for pub/sub; grows with in-flight task count
MongoDB storage	~10KB per workflow message; grows with conversation history
SSE connections	1 per active client; lightweight on API side