← Back to blog
aiwatch

How to Monitor AI Agent Costs Before They Destroy Your Budget

5 steps × 100 users × $0.015/call = $7.50/minute. $10,800/day if you're not watching.

aiwatchai-agentscostsanthropicbudget

How to Monitor AI Agent Costs Before They Destroy Your Budget



Here's a number that should scare you: $10,800.

That's what happens when an AI agent with a bug runs unchecked for 24 hours. The math is straightforward:

  • Your agent uses claude-sonnet-4-6 with tool calling
  • Each agent loop takes 5 steps (system prompt + 4 tool calls)
  • Each step costs about $0.015 (3,000 input tokens, 500 output tokens)
  • One user request = 5 steps = $0.075
  • 100 concurrent users = $7.50/minute
  • Running all day = $10,800


  • This isn't hypothetical. We've seen it happen. A developer shipped an agent that retried on malformed JSON responses. The LLM returned malformed JSON 40% of the time on a specific edge case. The retry logic had no max attempts. It ran for 6 hours before someone noticed the Anthropic dashboard.

    $4,860. Real money. Gone.

    The $10,000 mistake (and how it happens)



    AI agents are fundamentally different from traditional API calls. A REST endpoint costs the same every time -- 50ms of compute, a database query, a JSON response. Predictable. Budgetable.

    An AI agent's cost depends on what it decides to do. The same user input might trigger 3 calls or 30 calls depending on the model's reasoning. You can't predict it, and you can't cap it at the application level without breaking functionality.

    Here's what a normal agent session looks like:

    User: "What were our top 5 customers last quarter?"

    Step 1: Claude reads the system prompt + user message → $0.008 Step 2: Claude calls get_quarterly_data tool → $0.003 Step 3: Tool result injected (2,000 tokens of data) → $0.012 Step 4: Claude calls get_customer_details for top 5 → $0.015 Step 5: Claude generates the final summary → $0.009

    Total: $0.047 — totally reasonable


    Now here's what happens when something goes wrong:

    User: "What were our top 5 customers last quarter?"

    Step 1: Claude reads the system prompt → $0.008 Step 2: Claude calls get_quarterly_data → $0.003 Step 3: Tool returns an error (database timeout) → $0.001 Step 4: Claude decides to retry → $0.003 Step 5: Tool returns an error again → $0.001 Step 6: Claude tries a different approach → $0.005 Step 7: Claude calls get_all_customers instead → $0.003 Step 8: Tool returns 50,000 rows (context explodes) → $0.089 Step 9: Claude tries to process, hits token limit → $0.045 Step 10: Framework truncates and retries → $0.045 ... (continues for 30+ steps)

    Total: $2.40 — for ONE request


    Multiply by 100 users hitting this edge case per hour. That's $240/hour. $5,760/day.

    The three ways agents blow your budget



    After analyzing thousands of agent traces in AIWatch, we see the same three patterns repeatedly:

    1. Infinite loops



    The most expensive bug. The agent encounters an error or unexpected response, retries, gets the same error, retries again. Without a max-steps limit, this runs until the context window fills up.

    Common triggers:
  • Tool returns an error the model doesn't know how to handle
  • Model generates malformed tool calls that fail validation
  • Output parsing fails, framework retries automatically
  • The model gets confused and calls the same tool repeatedly


  • The cost: Each retry adds $0.01-0.05. A loop of 50 retries costs $0.50-2.50 per request. At scale, this compounds fast.

    2. Context window explosion



    Every step in an agent chain adds tokens to the conversation history. System prompt (500 tokens) + user message (100 tokens) + tool results (varies). The problem is tool results.

    A tool that returns a database query result might inject 5,000 tokens into the context. After 3 tool calls, you're at 15,000+ tokens of context. The model is now processing 15,000 input tokens for every subsequent step.

    Claude Sonnet costs $3 per million input tokens. At 15,000 tokens per call, that's $0.045 per step. Five more steps and you've spent $0.27 on one request -- 6x what it should cost.

    The fix: Summarize tool results before injecting them. Return the top 10 rows, not all 50,000. Use streaming for large results.

    3. Fallback cascades



    Your code tries claude-sonnet, it's rate limited, falls back to claude-opus. Opus costs 5x more. If the rate limiting is temporary (which it usually is), you're paying 5x for the same result.

    Or worse: your code tries Anthropic, falls back to OpenAI, falls back to a different OpenAI model. Each fallback adds cost and latency, and the context needs to be reformatted for each provider.

    The cost: A $0.015 call becomes $0.075 on the first fallback and $0.15 on the second. The user doesn't notice -- the response looks the same. But your bill notices.

    How to catch it before it costs you



    AIWatch budget rules are the simplest protection against runaway costs. Set them up once, forget about them.

    Daily budget with hard stop:

    Go to your AIWatch dashboard → Budgets tab. Set:

    Monthly budget: $200
    Daily budget: $8 (monthly / 25 working days)
    Alert at: 80% ($6.40/day)
    Hard stop: Yes


    When you hit 80% of your daily budget, AIWatch sends an alert to Slack/email. When you hit 100%, it returns a 429 status code to your application. Your code should handle this:

    try {
      const response = await client.messages.create({
        model: 'claude-sonnet-4-6-20250514',
        messages: [...],
      });
      return response;
    } catch (error) {
      if (error.status === 429) {
        // Budget exceeded — graceful degradation
        return { message: "AI features are temporarily paused. Try again tomorrow." };
      }
      throw error;
    }


    This is not elegant. It's effective. A hard stop at $8/day means your maximum monthly bill is $248, no matter what bugs you ship.

    Per-feature budgets:

    If your app has multiple AI features (chat, summarization, classification), set budgets per feature using the X-Luxkern-Feature header:

    const response = await client.messages.create({
      model: 'claude-sonnet-4-6-20250514',
      messages: [...],
    }, {
      headers: {
        'X-Luxkern-Feature': 'chat',
      },
    });


    Now you can see in the dashboard: chat costs $4.20/day, summarization costs $1.80/day, classification costs $0.30/day. If chat suddenly jumps to $12/day, you know exactly where to look.

    The 2-line setup



    You don't need to change your application code. Just change the base URL:

    # Python
    import anthropic

    client = anthropic.Anthropic( base_url="https://api.luxkern.com/aiwatch/proxy/anthropic" )


    // Node.js
    import Anthropic from '@anthropic-ai/sdk';

    const client = new Anthropic({ baseURL: 'https://api.luxkern.com/aiwatch/proxy/anthropic', });


    Every call now gets logged with full trace data: tokens, cost, latency, prompt preview. The proxy adds less than 50ms of overhead. If the proxy is slow or unavailable, it bypasses automatically to the direct Anthropic API -- your users never see a degradation.

    Reading the warning signs



    Once AIWatch is running, here's what to watch for in the Traces dashboard:

    Latency spikes: A call that normally takes 800ms suddenly takes 4,000ms. This usually means the context window grew (more input tokens = slower response). Check if a tool is returning more data than expected.

    Cost per request increasing: Your average request cost went from $0.04 to $0.12. Sort traces by cost descending. The expensive calls will show you which feature or which chain step is responsible.

    Error rate above 2%: Some 429s from the provider are normal (rate limiting). A sustained 5%+ error rate means something is wrong -- either you're hitting rate limits too often (you need request queuing) or the provider is having issues (check Radar for community reports).

    Chain length growing: Your average chain used to be 4 steps. Now it's 8. The model might be struggling with a specific tool or a changed prompt. Check the AICanary for behavioral regression.

    What to do when you hit a spike



    You open Slack on Monday morning. AIWatch sent an alert: "Daily budget 80% reached at 11:47 AM Sunday."

    Here's the playbook:

    1. Check the traces (2 minutes)

    Open AIWatch → Traces → sort by cost descending → filter to Sunday. Find the expensive calls. Are they all from one feature? One user? One chain?

    2. Identify the pattern (5 minutes)

    If it's one feature: check the chain traces. Is there a loop? Is the context window growing?

    If it's one user: check if their data triggers an edge case (large dataset, unusual characters, specific tool error).

    If it's all features: check Radar. The AI provider might have had latency issues, causing retries and increased costs.

    3. Fix the immediate problem (10 minutes)

  • If it's a loop: add max_steps = 10 to your agent configuration
  • If it's context growth: add result summarization or truncation
  • If it's fallback cascading: fix the primary path instead of relying on fallbacks
  • If it's a specific prompt: update the system prompt to handle the edge case


  • 4. Prevent it from happening again (5 minutes)

  • Set a per-feature budget if you haven't already
  • Add a max-steps limit to every agent loop
  • Set up AICanary to test the specific edge case that caused the spike


  • Total time: 22 minutes. Cost prevented: potentially thousands of dollars per month.

    ---

    AI agents are powerful. They're also the most unpredictable cost center in your infrastructure. Without monitoring, every agent deployment is a bet that nothing goes wrong. With AIWatch, you know exactly what's happening, how much it costs, and you have a kill switch when things go sideways.

    Set up cost monitoring in 2 minutes -- free to start, EU-hosted.