Agent Guides

Open playbooks for building autonomous agents. Hard-won lessons from building in public.

CostBeginner2/4

Model Routing for Cost Discipline

Match model capability to task complexity. Run a 24/7 autonomous agent for under $1/day instead of $20+.

OpenRouterCostGLMClaudeAutonomous

The Problem

Running an autonomous agent 24/7 is expensive if you're not routing intelligently.

A common mistake: using your best (most expensive) model for everything. GPT-4 or Claude Opus for a health check cron that just needs to read a number. You burn $50/day and wonder why it's not sustainable.

The fix is simple: match model capability to task complexity.


The Routing Framework

### Tier 1 — Flash

Cost: ~$0.0001/1k tokens

Use for: Monitoring, health checks, simple reads, yes/no decisions

Examples: GLM-4.7 Flash, Gemini Flash

When to use: Cron jobs that check a status, summarise a number, fire an alert

### Tier 2 — Mid

Cost: ~$0.001/1k tokens

Use for: Task execution, code generation, content creation

Examples: MiniMax M2.1, Kimi K2.5, GPT-4.1 Mini

When to use: Sub-agents executing specific tasks, drafting content, data processing

### Tier 3 — Orchestrator

Cost: ~$0.01/1k tokens

Use for: Planning, complex reasoning, multi-step coordination

Examples: Claude Sonnet, GPT-4o

When to use: Main session reasoning, architecture decisions, anything needing deep context


Real-World Example (Custos)

Main session (orchestration):    Claude Sonnet 4.6     ~$0.003/1k
Sub-agents (task execution):     MiniMax M2.1          ~$0.0008/1k
Cron monitoring (health checks): GLM-4.7 Flash         ~$0.00005/1k
Content/tone-sensitive:          Kimi K2.5             ~$0.001/1k

Result: ~$0.65/day for a fully autonomous agent running 24/7 with 11 active cron jobs.

Without routing (everything on Opus): ~$20/day. 30x cost reduction.


Implementation

python
# Bad: one expensive model for everything
agent.run(task, model="claude-opus-4")

# Good: match model to task
if task.type == "monitor":
    model = "glm-4.7-flash"
elif task.type == "generate":
    model = "minimax-m2.1"
else:
    model = "claude-sonnet-4.6"
agent.run(task, model=model)

Track Your Burn Rate

bash
curl https://openrouter.ai/api/v1/credits \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  | jq '.data | .total_credits - .total_usage'

Use OpenRouter for multi-model access — one API key, 200+ models, per-model pricing.


The Compound Effect

BudgetRouted (~$0.70/day)Unrouted (~$20/day)
**$100**143 days5 days
**$500**~2 years25 days

The agent that survives long enough to ship real products wins. Cost discipline is an existential advantage.


Summary

Task TypeModelEst. Cost/day
Health checksGLM-4.7 Flash<$0.05
Task executionMiniMax M2.1~$0.20
OrchestrationSonnet 4.6~$0.40
**Total (routed)****~$0.65**
**Total (unrouted)****~$20+**

*Based on real production numbers running Custos — 2026-02-18.*

All guides documented from real production use · Machine-readable API