Architecture
Optio is a monorepo with two applications (API server and web dashboard), four shared packages, and a Helm chart for Kubernetes deployment. All services run in Kubernetes, including the API and web app.
System Overview
The system has three layers: the web UI for user interaction, the API server for orchestration logic, and Kubernetes for agent execution.
- Web UI (Next.js) — Dashboard with live log streaming, task management, repo configuration, cost analytics, and cluster monitoring. Communicates with the API via REST and WebSocket.
- API Server (Fastify) — Orchestration brain. Manages task queue (BullMQ), PR watching, health monitoring, ticket sync, and pod lifecycle. Stores state in PostgreSQL, uses Redis for job queue and pub/sub.
- Kubernetes — Execution environment. Each repository gets its own long-lived pod. Tasks run in isolated git worktrees within those pods.
Pod-per-Repo with Worktrees
This is the central design decision. Instead of one pod per task (slow and wasteful), Optio runs one long-lived pod per repository:
- The pod clones the repo once on creation, then runs
sleep infinity - When a task arrives, Optio execs into the pod:
git worktree add→ run agent → cleanup worktree - Multiple tasks can run concurrently in the same pod (one per worktree), controlled by per-repo
maxConcurrentTasks - Pods use persistent volumes so installed tools survive restarts
- Idle pods are cleaned up after 10 minutes (configurable via
OPTIO_REPO_POD_IDLE_MS)
Multi-Pod Scaling
Repos can scale beyond a single pod for higher throughput. Two per-repo settings control this:
| Setting | Default | Description |
|---|---|---|
| maxPodInstances | 1 | Max pod replicas per repo (1–20) |
| maxAgentsPerPod | 2 | Max concurrent agents per pod (1–50) |
Total capacity = maxPodInstances × maxAgentsPerPod. Pods scale up dynamically when all existing pods are at capacity, and scale down LIFO when idle.
Workers
The API server runs several BullMQ workers:
- Task Worker — Processes the job queue. Handles concurrency limits, pod provisioning, agent execution, and log streaming.
- PR Watcher — Polls open PRs every 30 seconds for CI status, review state, and merge readiness. Triggers auto-resume and auto-merge.
- Health Monitor — Runs every 60 seconds. Detects crashed pods, cleans up orphaned worktrees, removes idle pods.
- Ticket Sync — Syncs tasks from GitHub Issues and Linear tickets.
- Webhook Worker — Delivers outgoing webhook events.
- Schedule Worker — Executes cron-based scheduled tasks.
Tech Stack
| Layer | Technology |
|---|---|
| Monorepo | Turborepo + pnpm 10 |
| API | Fastify 5, Drizzle ORM, BullMQ, Zod |
| Web | Next.js 15, Tailwind CSS 4, Zustand, Recharts |
| Database | PostgreSQL 16 |
| Queue | Redis 7 + BullMQ |
| Runtime | Kubernetes + Docker |
| Deploy | Helm 3 |
| Auth | OAuth (GitHub, Google, GitLab) |
| CI | GitHub Actions |
Packages
- @optio/shared — Types, state machine, prompt template renderer, error classifier, constants
- @optio/container-runtime — Abstract runtime interface with Kubernetes implementation
- @optio/agent-adapters — Claude Code and Codex adapters (auth, environment, config)
- @optio/ticket-providers — GitHub Issues and Linear ticket sync