How I burned $80 on Claude Code in a Sunday afternoon
The MacBook Pro M1 fan was the first sign. I'd just kicked off a
parallel-spawn loop — about one hundred claude code
--print instances, each tasked with checking a single
web-search result for prompt-injection content. The fan spun to
maximum within about ten seconds and stayed there until I figured
out how to kill everything.
Total damage: $80 in Opus 4.7 API charges, a denied reimbursement request from Anthropic's automated support system within minutes of filing, and a sharper mental model of why subprocess spawning is the wrong architecture for anything that calls a paid LLM. This post is the story of that afternoon and the architectural lesson it cost me to learn properly — even though the architecture that would have prevented it (loomcycle) was already in flight when it happened.
What I was trying to do
The feature was a prompt-injection-detection pass for jobs-search-agent, a multi-tenant SaaS I was building. When an agent runs a web search, the results can carry injected content trying to subvert the agent's instructions. The defence I wanted was simple: fan out to N independent verifiers, one per result, each running a focused Claude prompt that returns "this looks like injection / this doesn't." A small voting layer on top, and the agent only consumes results that pass.
The context worth flagging: loomcycle was already in
flight as the runtime layer for jobs-search-agent's
other agent calls. Most model invocations in the project were
routed through loomcycle's HTTP-only loop. But this
prompt-injection feature was a recent addition, and I'd
implemented it the quick way — a shell-style spawn loop calling
out to claude code --print per result, bypassing the
runtime layer entirely. I'd meant to migrate it. I hadn't yet.
claude code --print is the dev-friendly CLI
entrypoint to a Claude session — non-interactive, prints the
model's response to stdout. It picks up
ANTHROPIC_API_KEY from the shell environment
automatically. Parallelising it is engineering 101: one line of
shell, or a small wrapper in any language. I'd done variants of
this pattern a hundred times for other CLI tools. The reasoning
felt sound, in the way that bad reasoning always does after the
fact.
What went wrong
Three things I did not know about claude code --print
at the time:
- With no explicit model flag, it defaults to Opus 4.7 — the most expensive model in the Anthropic lineup. Not Haiku. Not Sonnet. The expensive one. By default.
-
Each invocation starts a full Node-based session with the
bundled
claudebinary doing its own bootstrap. On an M1 MacBook Pro, each process is roughly 50 MB of resident memory plus a couple of seconds of cold-start before the actual model call begins. -
Each session inherits the parent shell's environment via
standard
execve(2)semantics. Whatever's set in the environment of the spawning process — includingANTHROPIC_API_KEY— becomes the child's credential.
The spawn loop fired. The MacBook's fan jumped to maximum within
seconds. The terminal became sluggish. I opened another window
and ran ps to see what was actually running — a
forest of claude and node processes,
parent/child relationships unclear from the listing alone.
top showed CPU pinned at 100% across all cores and
memory climbing fast. By that point most of the spawned
processes were past their cold-start phase and into actual API
calls.
I ran pkill -f claude and watched the process count
drop. Some were still in the middle of API requests when they
went down; some had already returned and exited. The MacBook's
fan kept running for another minute after the last process
exited. The UI stayed sluggish for about thirty seconds. None of
this told me what the spend had been.
The bill
I didn't know the API spend until the next Anthropic console refresh — sometime the following morning. Eighty dollars, all attributed to my API key, all in Opus 4.7 invocations on the Sunday afternoon.
The math was input-heavy. Roughly a hundred invocations, each consuming 1–3K input tokens (the full search-result page) and producing a very short verdict — yes or no, maybe a sentence of reasoning. Opus 4.7's pricing is $15 per million input tokens and $75 per million output tokens. The bill weighted overwhelmingly toward input volume; the output token count was tiny. Multiplied out: about what I saw on the dashboard.
The architectural cost-of-decision became clearer in retrospect. A yes/no classification task — exactly the kind of work Haiku is designed for, at roughly $1/M input and $5/M output — got handed to Opus 4.7 because that's the default of the CLI tool I'd invoked. Doing the math afterwards, the cost differential between "should have run Haiku" and "ran Opus 4.7" is roughly 15× on input-heavy workloads. Eighty dollars in Opus would have been about five on Haiku. The CLI's default did the damage; my failure to override the model flag was the contributing cause. None of that was in my head before the loop fired.
Anthropic's robot
I filed a reimbursement request through the support form within minutes of seeing the dashboard. I wrote out the incident in honest detail: subprocess auth inheritance, a parallel-spawn loop that exceeded my intended fan-out, the fact that the CLI's default-to-Opus behaviour multiplied the bill roughly fifteen-fold over what the actual task warranted.
The response came back in minutes — clearly an automated triage system. The charges were valid. The API key was authenticated. The spend was authorised by an account-level credential. No reimbursement available.
I get the policy. The automated system can't distinguish between "user intentionally ran a hundred Opus calls" and "user wrote a parallel-spawn loop that exceeded their own intent." From the provider's perspective the request is uncontroversial: my key, my charges. The robot isn't wrong.
The small developer-irritation point is that eighty dollars sits in a frustrating zone. Big enough to hurt as a personal credit-card charge. Small enough to be below most companies' "would file an expense report" threshold. Small enough that no human at the provider would ever look at it. The robot's denial reads as both correct and slightly cold, which is roughly how that class of automated response always reads.
The architectural lesson
The eighty dollars was tuition for understanding what subprocess spawning actually costs in three independent risk dimensions:
- Compute. A spawned process is a real OS process, not a goroutine or a fibre. A hundred of them on a consumer machine pin the CPU, consume gigabytes of memory, and saturate the kernel's context-switching budget. The host falls over before you can intervene. Your fan tells you something is wrong; you can't always tell what.
- Dollar cost. Each spawned process makes its own API calls independently. There is no central place to rate-limit. You do not get a callback when the child does network I/O. Spend is bounded only by the API quota.
-
Environment propagation.
execve(2)semantics. Children inherit the parent's environment by default. Whatever credentials are set, get used. There is no clean way to scrub the environment at spawn time without rewriting how OS process creation works.
Each risk is dangerous alone. The compute one announces itself — your fan, your activity monitor. The dollar one is invisible until the next billing cycle. The environment one is invisible forever unless you specifically audit it.
Together they create a window — the lag between "process spawns" and "I realise it's running hot" — that is itself a cost-multiplier. On a personal laptop with no monitoring infrastructure, that window is measured in minutes. In production multi-tenant deployments, the same window can be measured in whatever it takes for the operator to notice anomalous behaviour. The structural conclusion is the same in both cases: do not spawn subprocesses to make model calls. Hold the credentials, the rate limits, and the concurrency boundaries in one process you control.
What came next
I want to be honest about the timeline. Loomcycle
already existed when this happened. The runtime layer was in
flight, most of jobs-search-agent's model calls were routed
through it, and the architectural argument for HTTP-only
goroutine-per-agent had already been made and built. The reason
the $80 happened isn't that I hadn't yet designed loomcycle — it's
that I hadn't yet finished the migration. The
prompt-injection-detection feature was a recently-added code path
that still went around the runtime, calling
claude code --print directly the way I used to
before loomcycle was a thing.
What came next was the migration finish. Within a few days every
model invocation in jobs-search-agent went through loomcycle's
HTTP-only loop. No subprocess in the hot path. Credentials
supplied per-request, never inherited from shell environment.
Concurrency bounded by an explicit semaphore rather than by
for x in results; do ... &; done. The bug class
stopped existing not because we patched it but because the
runtime has no place to put a claude code --print
spawn even if I wanted to.
Months later, I wrote a benchmark harness to validate which
models belong in which loomcycle tier.
I made a different mistake that time, a more subtle one: I
authored the bench cases against guessed tool-arg shapes instead
of the actual MCP tools/list output. Cost me about
$0.49 in over-provisioned Opus calls to discover, plus the time
to retract the wrong conclusions and re-run. That story is in
the
next post. The shape of the mistake is the same — false
confidence about what would happen, expensive correction — but
it's at least a different shape of false confidence.
Closing
For any developer who has burned cash on a parallel-spawn loop
and gotten a polite robot-denied reimbursement reply: you are not
alone, the money was tuition, and the lesson is structural. Don't
put model calls behind execve. Spawn-and-forget is
cheap and easy with most CLI tools; it is none of those things
when the tool defaults to a paid API and the model is the most
expensive one available.
Loomcycle predates this afternoon, but this afternoon made me finish what I'd started. There is no excuse for an unmigrated code path that bypasses your own runtime — and the cost of discovering one accidentally can be a sharp number on a personal credit card. Apache-2.0 at github.com/denn-gubsky/loomcycle. If you have had a similar incident, or you are building something where this pattern is a risk, I would like to hear about it.