How to Monitor AI Agent Costs Before They Destroy Your Budget
5 steps × 100 users × $0.015/call = $7.50/minute. $10,800/day if you're not watching.
How to Monitor AI Agent Costs Before They Destroy Your Budget
Here's a number that should scare you: $10,800.
That's what happens when an AI agent with a bug runs unchecked for 24 hours. The math is straightforward:
This isn't hypothetical. We've seen it happen. A developer shipped an agent that retried on malformed JSON responses. The LLM returned malformed JSON 40% of the time on a specific edge case. The retry logic had no max attempts. It ran for 6 hours before someone noticed the Anthropic dashboard.
$4,860. Real money. Gone.
The $10,000 mistake (and how it happens)
AI agents are fundamentally different from traditional API calls. A REST endpoint costs the same every time -- 50ms of compute, a database query, a JSON response. Predictable. Budgetable.
An AI agent's cost depends on what it decides to do. The same user input might trigger 3 calls or 30 calls depending on the model's reasoning. You can't predict it, and you can't cap it at the application level without breaking functionality.
Here's what a normal agent session looks like:
User: "What were our top 5 customers last quarter?"
Step 1: Claude reads the system prompt + user message → $0.008
Step 2: Claude calls get_quarterly_data tool → $0.003
Step 3: Tool result injected (2,000 tokens of data) → $0.012
Step 4: Claude calls get_customer_details for top 5 → $0.015
Step 5: Claude generates the final summary → $0.009
Total: $0.047 — totally reasonableNow here's what happens when something goes wrong:
User: "What were our top 5 customers last quarter?"
Step 1: Claude reads the system prompt → $0.008
Step 2: Claude calls get_quarterly_data → $0.003
Step 3: Tool returns an error (database timeout) → $0.001
Step 4: Claude decides to retry → $0.003
Step 5: Tool returns an error again → $0.001
Step 6: Claude tries a different approach → $0.005
Step 7: Claude calls get_all_customers instead → $0.003
Step 8: Tool returns 50,000 rows (context explodes) → $0.089
Step 9: Claude tries to process, hits token limit → $0.045
Step 10: Framework truncates and retries → $0.045
... (continues for 30+ steps)
Total: $2.40 — for ONE requestMultiply by 100 users hitting this edge case per hour. That's $240/hour. $5,760/day.
The three ways agents blow your budget
After analyzing thousands of agent traces in AIWatch, we see the same three patterns repeatedly:
1. Infinite loops
The most expensive bug. The agent encounters an error or unexpected response, retries, gets the same error, retries again. Without a max-steps limit, this runs until the context window fills up.
Common triggers:
The cost: Each retry adds $0.01-0.05. A loop of 50 retries costs $0.50-2.50 per request. At scale, this compounds fast.
2. Context window explosion
Every step in an agent chain adds tokens to the conversation history. System prompt (500 tokens) + user message (100 tokens) + tool results (varies). The problem is tool results.
A tool that returns a database query result might inject 5,000 tokens into the context. After 3 tool calls, you're at 15,000+ tokens of context. The model is now processing 15,000 input tokens for every subsequent step.
Claude Sonnet costs $3 per million input tokens. At 15,000 tokens per call, that's $0.045 per step. Five more steps and you've spent $0.27 on one request -- 6x what it should cost.
The fix: Summarize tool results before injecting them. Return the top 10 rows, not all 50,000. Use streaming for large results.
3. Fallback cascades
Your code tries claude-sonnet, it's rate limited, falls back to claude-opus. Opus costs 5x more. If the rate limiting is temporary (which it usually is), you're paying 5x for the same result.
Or worse: your code tries Anthropic, falls back to OpenAI, falls back to a different OpenAI model. Each fallback adds cost and latency, and the context needs to be reformatted for each provider.
The cost: A $0.015 call becomes $0.075 on the first fallback and $0.15 on the second. The user doesn't notice -- the response looks the same. But your bill notices.
How to catch it before it costs you
AIWatch budget rules are the simplest protection against runaway costs. Set them up once, forget about them.
Daily budget with hard stop:
Go to your AIWatch dashboard → Budgets tab. Set:
Monthly budget: $200
Daily budget: $8 (monthly / 25 working days)
Alert at: 80% ($6.40/day)
Hard stop: YesWhen you hit 80% of your daily budget, AIWatch sends an alert to Slack/email. When you hit 100%, it returns a 429 status code to your application. Your code should handle this:
try {
const response = await client.messages.create({
model: 'claude-sonnet-4-6-20250514',
messages: [...],
});
return response;
} catch (error) {
if (error.status === 429) {
// Budget exceeded — graceful degradation
return { message: "AI features are temporarily paused. Try again tomorrow." };
}
throw error;
}This is not elegant. It's effective. A hard stop at $8/day means your maximum monthly bill is $248, no matter what bugs you ship.
Per-feature budgets:
If your app has multiple AI features (chat, summarization, classification), set budgets per feature using the
X-Luxkern-Feature header:const response = await client.messages.create({
model: 'claude-sonnet-4-6-20250514',
messages: [...],
}, {
headers: {
'X-Luxkern-Feature': 'chat',
},
});Now you can see in the dashboard: chat costs $4.20/day, summarization costs $1.80/day, classification costs $0.30/day. If chat suddenly jumps to $12/day, you know exactly where to look.
The 2-line setup
You don't need to change your application code. Just change the base URL:
# Python
import anthropic
client = anthropic.Anthropic(
base_url="https://api.luxkern.com/aiwatch/proxy/anthropic"
)// Node.js
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
baseURL: 'https://api.luxkern.com/aiwatch/proxy/anthropic',
});Every call now gets logged with full trace data: tokens, cost, latency, prompt preview. The proxy adds less than 50ms of overhead. If the proxy is slow or unavailable, it bypasses automatically to the direct Anthropic API -- your users never see a degradation.
Reading the warning signs
Once AIWatch is running, here's what to watch for in the Traces dashboard:
Latency spikes: A call that normally takes 800ms suddenly takes 4,000ms. This usually means the context window grew (more input tokens = slower response). Check if a tool is returning more data than expected.
Cost per request increasing: Your average request cost went from $0.04 to $0.12. Sort traces by cost descending. The expensive calls will show you which feature or which chain step is responsible.
Error rate above 2%: Some 429s from the provider are normal (rate limiting). A sustained 5%+ error rate means something is wrong -- either you're hitting rate limits too often (you need request queuing) or the provider is having issues (check Radar for community reports).
Chain length growing: Your average chain used to be 4 steps. Now it's 8. The model might be struggling with a specific tool or a changed prompt. Check the AICanary for behavioral regression.
What to do when you hit a spike
You open Slack on Monday morning. AIWatch sent an alert: "Daily budget 80% reached at 11:47 AM Sunday."
Here's the playbook:
1. Check the traces (2 minutes)
Open AIWatch → Traces → sort by cost descending → filter to Sunday. Find the expensive calls. Are they all from one feature? One user? One chain?
2. Identify the pattern (5 minutes)
If it's one feature: check the chain traces. Is there a loop? Is the context window growing?
If it's one user: check if their data triggers an edge case (large dataset, unusual characters, specific tool error).
If it's all features: check Radar. The AI provider might have had latency issues, causing retries and increased costs.
3. Fix the immediate problem (10 minutes)
max_steps = 10 to your agent configuration4. Prevent it from happening again (5 minutes)
Total time: 22 minutes. Cost prevented: potentially thousands of dollars per month.
---
AI agents are powerful. They're also the most unpredictable cost center in your infrastructure. Without monitoring, every agent deployment is a bet that nothing goes wrong. With AIWatch, you know exactly what's happening, how much it costs, and you have a kill switch when things go sideways.
Set up cost monitoring in 2 minutes -- free to start, EU-hosted.