Skip to content

Scaling

TBD Agents is designed for horizontal scaling from day one. All components are stateless (except the databases), so you can add capacity by running more instances.


Scaling Strategy

graph TB
    subgraph Load Balancer
        LB[Reverse Proxy / Ingress]
    end

    subgraph API Instances
        API1[FastAPI #1]
        API2[FastAPI #2]
        API3[FastAPI #N]
    end

    subgraph Worker Pool
        W1[Worker #1<br/>concurrency=4]
        W2[Worker #2<br/>concurrency=4]
        W3[Worker #N<br/>concurrency=4]
    end

    subgraph Infrastructure
        Redis[(Redis Cluster)]
        Mongo[(MongoDB Replica Set)]
    end

    LB --> API1 & API2 & API3
    API1 & API2 & API3 --> Redis
    API1 & API2 & API3 --> Mongo
    Redis --> W1 & W2 & W3
    W1 & W2 & W3 --> Mongo
    W1 & W2 & W3 --> Redis

Horizontal Worker Scaling

Workers are stateless — they load everything from MongoDB and communicate via Redis. Add more containers to handle more concurrent agent runs.

# Docker Compose — run 5 worker containers
docker-compose up --build --scale worker=5

# Each worker runs --concurrency=4
# Total = 20 concurrent agent executions

Horizontal API Scaling

The FastAPI app service is also stateless. Run multiple instances behind a load balancer:

docker-compose up --build --scale app=3

SSE connections are per-client, and each API instance independently subscribes to Redis pub/sub for the relevant workflow channel.


Infrastructure Scaling

Component Strategy
Redis Redis Sentinel or Redis Cluster for high availability
MongoDB Replica sets or MongoDB Atlas
Workers Increase --concurrency per container or add containers
API Multiple instances behind a reverse proxy

Kubernetes / Helm

TBD Agents includes Helm charts for Kubernetes deployment with:

  • API HPA — Horizontal Pod Autoscaler for the FastAPI service
  • KEDA ScaledObject — Autoscale workers based on Redis queue length
  • PVC — Persistent volume claims for data
  • Ingress — Configurable ingress for external access
graph LR
    Ingress[Ingress Controller] --> APISvc[API Service]
    APISvc --> APIPods[API Pods<br/>HPA: 2-10]
    KEDA[KEDA] -->|Scale on queue depth| WorkerPods[Worker Pods<br/>ScaledObject: 1-20]
    APIPods --> Redis[(Redis)]
    APIPods --> Mongo[(MongoDB)]
    WorkerPods --> Redis
    WorkerPods --> Mongo

Capacity Planning

Metric Guidance
Concurrent agents 1 agent ≈ 1 Celery task slot. workers × concurrency = max concurrent
Memory per worker ~256MB base + SDK overhead per concurrent task
Redis memory Minimal for pub/sub; grows with in-flight task count
MongoDB storage ~10KB per workflow message; grows with conversation history
SSE connections 1 per active client; lightweight on API side