Stop Burning Tokens: A Field Guide to Claude Code
The features most people never find, and the handful of habits that quietly cut your Claude Code bill. Verified against the docs, with an interactive quiz to test yourself.
Most "Claude Code tips" articles are wrong. They're SEO filler written by people who opened the tool twice. This one isn't — every tip below is checked against the current docs and changelog, and it leans on the stuff that actually moves the needle: the habits that cut your bill.
There's an interactive quiz at the end if you'd rather find out the hard way how much you're leaving on the table.
The one mental model that explains most of the savings: every message re-sends the entire conversation as input tokens. Context discipline — not shorter prompts — is the dominant lever. Everything below follows from that.
Context is the bill
/clearbetween unrelated tasks. If you'd open a new document for it, you should clear. This is the single biggest token lever there is — a long all-day session re-bills its early context on every later message./compactonly when you need continuity./compact Focus on the auth bugkeeps what you name — but compaction itself costs an LLM pass, so prefer/clearfor a clean switch./contextshows exactly what's filling the window: system prompt, CLAUDE.md, MCP tools, file reads, history. Run it first when a session feels heavy./usageattributes spend to skills, subagents, plugins, and individual MCP servers./costshows the session breakdown.- Course-correct early with
Esc. Stopping a wrong path keeps a polluted transcript from being re-sent every turn afterward.
Right-size the model
- Haiku for mechanical work (renames, boilerplate), Sonnet for daily coding, Opus for hard reasoning. Haiku is roughly 5× cheaper than Opus on input.
/modelswitches the current session. - Thinking bills as output. Reserve
ultrathink(and high/effort) for genuinely hard problems; drop effort for routine work. - Fast mode gives you Opus output at higher speed for snappier iteration.
Protect the cache
- Prompt caching can make repeated context ~10× cheaper to re-read. But it's a prefix match — editing CLAUDE.md mid-session, swapping context files, or idling past the TTL invalidates it.
- On the API, keep volatile content (timestamps, request IDs) after your last cache breakpoint, or you bust the whole prefix on every call.
The inputs that save searching
@file#10-40injects exactly those lines — no grep, no full-file read.!bash prefix runs a shell command inline;#saves a note to memory mid-flow./btwanswers a quick aside without adding it to history — keeps your task context clean.- Keep CLAUDE.md small. It rides in every prompt; treat it as a lookup table, not a brain dump.
Delegate the verbose stuff
- Run tests, log-crunching and wide searches in a subagent — it works in its own context and only the summary returns to your window. Set a subagent's model to Haiku for cheap delegated work.
- Move specialized, rarely-needed instructions into a Skill — the description is cheap; the full content only loads when invoked.
Make the machine enforce it
- Hooks fire deterministically (CLAUDE.md is only advisory). A
PostToolUsehook that auto-formats edited files, or aPreToolUsedeny-gate forrm -rf/.env, is worth setting up once. - A hook that greps a 10k-line log for
ERRORbefore Claude sees it saves a huge chunk of context on noisy commands.
The mistakes that cost the most
One all-day mega-session. Opus for trivial edits. Vague prompts that trigger repo-wide scanning. Runaway agent loops with no stop condition. A giant CLAUDE.md. Each one is a quiet, recurring tax — and each is a one-line fix.
Now test yourself. I built an interactive quiz — find out if you're a context ninja or a token bonfire. No login, no email.
Pricing, model details and cache behaviour change often — verify current numbers at code.claude.com/docs before relying on them.