§ war story

How I burned $80 on Claude Code in a Sunday afternoon

2026-05-07 · by Dennis Gubsky · ~6 min read

The MacBook Pro M1 fan was the first sign. I'd just kicked off a parallel-spawn loop - about one hundred claude code --print instances, each tasked with checking a single web-search result for prompt-injection content. The fan spun to maximum within about ten seconds and stayed there until I figured out how to kill everything.

Total damage: $80 in Opus 4.7 API charges, a denied reimbursement request from Anthropic's automated support system within minutes of filing, and a sharper mental model of why subprocess spawning is the wrong architecture for anything that calls a paid LLM. This post is the story of that afternoon and the architectural lesson it cost me to learn properly - even though the architecture that would have prevented it (loomcycle) was already in flight when it happened.

What I was trying to do

The feature was a prompt-injection-detection pass for jobs-search-agent, a multi-tenant SaaS I was building. When an agent runs a web search, the results can carry injected content trying to subvert the agent's instructions. The defence I wanted was simple: fan out to N independent verifiers, one per result, each running a focused Claude prompt that returns "this looks like injection / this doesn't." A small voting layer on top, and the agent only consumes results that pass.

The context worth flagging: loomcycle was already in flight as the runtime layer for jobs-search-agent's other agent calls. Most model invocations in the project were routed through loomcycle's HTTP-only loop. But this prompt-injection feature was a recent addition, and I'd implemented it the quick way - a shell-style spawn loop calling out to claude code --print per result, bypassing the runtime layer entirely. I'd meant to migrate it. I hadn't yet.

claude code --print is the dev-friendly CLI entrypoint to a Claude session - non-interactive, prints the model's response to stdout. It picks up ANTHROPIC_API_KEY from the shell environment automatically. Parallelising it is engineering 101: one line of shell, or a small wrapper in any language. I'd done variants of this pattern a hundred times for other CLI tools. The reasoning felt sound, in the way that bad reasoning always does after the fact.

What went wrong

Three things I did not know about claude code --print at the time:

With no explicit model flag, it defaults to Opus 4.7 - the most expensive model in the Anthropic lineup. Not Haiku. Not Sonnet. The expensive one. By default.
Each invocation starts a full Node-based session with the bundled claude binary doing its own bootstrap. On an M1 MacBook Pro, each process is roughly 50 MB of resident memory plus a couple of seconds of cold-start before the actual model call begins.
Each session inherits the parent shell's environment via standard execve(2) semantics. Whatever's set in the environment of the spawning process - including ANTHROPIC_API_KEY - becomes the child's credential.

The spawn loop fired. The MacBook's fan jumped to maximum within seconds. The terminal became sluggish. I opened another window and ran ps to see what was actually running - a forest of claude and node processes, parent/child relationships unclear from the listing alone. top showed CPU pinned at 100% across all cores and memory climbing fast. By that point most of the spawned processes were past their cold-start phase and into actual API calls.

I ran pkill -f claude and watched the process count drop. Some were still in the middle of API requests when they went down; some had already returned and exited. The MacBook's fan kept running for another minute after the last process exited. The UI stayed sluggish for about thirty seconds. None of this told me what the spend had been.

The bill

I didn't know the API spend until the next Anthropic console refresh - sometime the following morning. Eighty dollars, all attributed to my API key, all in Opus 4.7 invocations on the Sunday afternoon.

The math was input-heavy. Roughly a hundred invocations, each consuming 1-3K input tokens (the full search-result page) and producing a very short verdict - yes or no, maybe a sentence of reasoning. Opus 4.7's pricing is $15 per million input tokens and $75 per million output tokens. The bill weighted overwhelmingly toward input volume; the output token count was tiny. Multiplied out: about what I saw on the dashboard.

The architectural cost-of-decision became clearer in retrospect. A yes/no classification task - exactly the kind of work Haiku is designed for, at roughly $1/M input and $5/M output - got handed to Opus 4.7 because that's the default of the CLI tool I'd invoked. Doing the math afterwards, the cost differential between "should have run Haiku" and "ran Opus 4.7" is roughly 15× on input-heavy workloads. Eighty dollars in Opus would have been about five on Haiku. The CLI's default did the damage; my failure to override the model flag was the contributing cause. None of that was in my head before the loop fired.

Anthropic's robot

I filed a reimbursement request through the support form within minutes of seeing the dashboard. I wrote out the incident in honest detail: subprocess auth inheritance, a parallel-spawn loop that exceeded my intended fan-out, the fact that the CLI's default-to-Opus behaviour multiplied the bill roughly fifteen-fold over what the actual task warranted.

The response came back in minutes - clearly an automated triage system. The charges were valid. The API key was authenticated. The spend was authorised by an account-level credential. No reimbursement available.

I get the policy. The automated system can't distinguish between "user intentionally ran a hundred Opus calls" and "user wrote a parallel-spawn loop that exceeded their own intent." From the provider's perspective the request is uncontroversial: my key, my charges. The robot isn't wrong.

The small developer-irritation point is that eighty dollars sits in a frustrating zone. Big enough to hurt as a personal credit-card charge. Small enough to be below most companies' "would file an expense report" threshold. Small enough that no human at the provider would ever look at it. The robot's denial reads as both correct and slightly cold, which is roughly how that class of automated response always reads.

The architectural lesson

The eighty dollars was tuition for understanding what subprocess spawning actually costs in three independent risk dimensions:

Compute. A spawned process is a real OS process, not a goroutine or a fibre. A hundred of them on a consumer machine pin the CPU, consume gigabytes of memory, and saturate the kernel's context-switching budget. The host falls over before you can intervene. Your fan tells you something is wrong; you can't always tell what.
Dollar cost. Each spawned process makes its own API calls independently. There is no central place to rate-limit. You do not get a callback when the child does network I/O. Spend is bounded only by the API quota.
Environment propagation. execve(2) semantics. Children inherit the parent's environment by default. Whatever credentials are set, get used. There is no clean way to scrub the environment at spawn time without rewriting how OS process creation works.

Each risk is dangerous alone. The compute one announces itself - your fan, your activity monitor. The dollar one is invisible until the next billing cycle. The environment one is invisible forever unless you specifically audit it.

Together they create a window - the lag between "process spawns" and "I realise it's running hot" - that is itself a cost-multiplier. On a personal laptop with no monitoring infrastructure, that window is measured in minutes. In production multi-tenant deployments, the same window can be measured in whatever it takes for the operator to notice anomalous behaviour. The structural conclusion is the same in both cases: do not spawn subprocesses to make model calls. Hold the credentials, the rate limits, and the concurrency boundaries in one process you control.

What came next

I want to be honest about the timeline. Loomcycle already existed when this happened. The runtime layer was in flight, most of jobs-search-agent's model calls were routed through it, and the architectural argument for HTTP-only goroutine-per-agent had already been made and built. The reason the $80 happened isn't that I hadn't yet designed loomcycle - it's that I hadn't yet finished the migration. The prompt-injection-detection feature was a recently-added code path that still went around the runtime, calling claude code --print directly the way I used to before loomcycle was a thing.

What came next was the migration finish. Within a few days every model invocation in jobs-search-agent went through loomcycle's HTTP-only loop. No subprocess in the hot path. Credentials supplied per-request, never inherited from shell environment. Concurrency bounded by an explicit semaphore rather than by for x in results; do ... &; done. The bug class stopped existing not because we patched it but because the runtime has no place to put a claude code --print spawn even if I wanted to.

Months later, I wrote a benchmark harness to validate which models belong in which loomcycle tier. I made a different mistake that time, a more subtle one: I authored the bench cases against guessed tool-arg shapes instead of the actual MCP tools/list output. Cost me about $0.49 in over-provisioned Opus calls to discover, plus the time to retract the wrong conclusions and re-run. That story is in the next post. The shape of the mistake is the same - false confidence about what would happen, expensive correction - but it's at least a different shape of false confidence.

Closing

For any developer who has burned cash on a parallel-spawn loop and gotten a polite robot-denied reimbursement reply: you are not alone, the money was tuition, and the lesson is structural. Don't put model calls behind execve. Spawn-and-forget is cheap and easy with most CLI tools; it is none of those things when the tool defaults to a paid API and the model is the most expensive one available.

Loomcycle predates this afternoon, but this afternoon made me finish what I'd started. There is no excuse for an unmigrated code path that bypasses your own runtime - and the cost of discovering one accidentally can be a sharp number on a personal credit card. Apache-2.0 at github.com/denn-gubsky/loomcycle. If you have had a similar incident, or you are building something where this pattern is a risk, I would like to hear about it.