So I posted a thread on Threads the other day about saving tokens with Claude Code. My DMs filled up overnight with "wait, how does that actually work?", so I figured I'd write the whole thing down properly.
This is the no-fluff version. If we spend any real money on AI coding tools, or we keep hitting the weekly limit on a Pro plan, the five tricks below should give us a chunk of headroom back.
The quick list
Five things, ranked roughly by how much they save:
- Compress CLI output before it eats our context. Tools like RTK turn 1000 noisy lines into 200 useful ones.
- Use a structured project context tool. repomix, codesight, and friends. Trim noise, keep signal.
- Use Sonnet for the regular stuff. Save Opus for the heavy lifts.
- Clean the context.
/compactaggressively or just start over. - The "caveman skill". Fewer words, more tricks. Risky but powerful.
Now the longer version.
1. Compress CLI output (this one is huge)
A grep across a medium codebase will dump 800 matches into Claude's context. Most of them are duplicates. Same paths repeated. Same surrounding lines. Pure noise.
The fix is a tiny CLI tool that wraps our shell output and dedupes it before Claude ever sees it. RTK is the one I've been using:
rtk -la /Users/me/projects/foo
Real numbers from yesterday: 995 input tokens, 757 saved, 238 actually used. 76.1 percent efficiency. That ratio holds for git diff commands too, which is where it really shines. Long diffs full of whitespace and repeated context compress hard.
It is not magic. It just removes duplicate paths, collapses repeated context lines, and hands the result back. Same information, a third of the tokens.
If we only adopt one thing from this post, make it this one.
2. Use a structured context tool
The next biggest leak is "Claude, here's my whole repo, figure it out." Don't do that. Use a tool that builds a curated snapshot first.
A few I've tried and liked:
- yamadashy/repomix (23k stars), packs the repo into one AI-friendly file with smart filtering and token counting baked in.
npx repomixand we're done. - mufeedvh/code2prompt (7k stars), a Rust CLI that does the same job with prompt templating and source-tree output.
cargo installorbrew install. - houseofmvps/codesight, a universal context generator with a built-in MCP server. Saves thousands of tokens per conversation across Claude Code, Cursor, Copilot, Codex, and the rest.
All three do the same thing in spirit: don't ship raw files. Ship a clean, structured slice. They strip generated code, lockfiles, build output, and other stuff Claude simply does not need to see to answer the question.
3. Right model for the right job
This one is free, and most of us get it wrong.
- Opus for research, planning, deep refactors, "explain this whole subsystem to me before I touch anything."
- Sonnet for actual code edits, bug fixes, file-by-file work, anything where the Claude Code loop is doing the work and we just need the next move.
I caught myself running Opus on xhigh for things like "rename this variable across the file." That is a Sonnet job. Probably a Haiku job, honestly. Switching the regular grunt work to Sonnet shaved maybe 40 percent off my weekly usage with zero quality drop I could feel.
The mental model: Opus is the architect, Sonnet is the contractor. We don't pay an architect to swing a hammer.
4. Clean the context. Aggressively.
Long sessions are the silent killer. Every turn we keep, the model re-reads the entire conversation. Token counts grow nearly quadratic. The bill grows with them.
Two moves:
/compact. Claude summarises the conversation so far and continues with the summary in place of the raw history. Works really well right after an exploration phase.- New session. When we switch tasks, just start fresh. Don't drag the React debugging context into a Postgres question. It costs us, and it confuses the model.
The instinct is to keep the context "in case Claude forgets something." Resist it. A focused short session beats a sprawling long one every single time.
5. The caveman skill (use carefully)
This one got the most replies on the thread because it sounds completely insane and it absolutely works.
The idea: write our prompts like a caveman. No pleasantries. No "could you please." No background paragraphs. Just verbs and nouns.
Instead of:
"Hey Claude, when you have a moment, could you take a look at the auth middleware and check whether the session token validation is handling expired tokens correctly?"
We write:
"check auth middleware. expired token handling. correct?"
Same answer. A quarter of the input tokens. Multiplied across a hundred prompts a day, this adds up fast.
The catch: we do lose some nuance. Claude infers more, which means it sometimes infers wrong. I use the caveman style for quick lookups and reviews. Anything that touches production code, I write the full sentence and pay the tokens. No regrets either way.
Less words. More tricks. Used carefully.
What I am not doing
A few things people kept asking about that I don't bother with:
- Custom tokeniser hacks. Fragile, save pennies, break weekly.
- Prompt-shortening "agents". Add latency, often make things worse.
- Switching providers mid-task. Context loss costs more than the saved tokens.
The five above are the ones that actually move the needle.
Putting it together
A normal day for me looks like:
- Pipe long shell output through RTK before sharing.
- Run codesight once at session start to give Claude a clean project snapshot.
- Default to Sonnet, bump to Opus only for planning and research.
/compactwhenever a session crosses roughly 30 turns.- Write caveman prompts for trivial stuff, full sentences for production code.
On a heavy week, that combo cuts my token use by something like 60 to 70 percent versus running everything raw on Opus. Same output, less spend, far fewer "weekly limit reached" notifications staring back at me on a Sunday night.
If we find other tricks that work, ping me on Threads, I'll fold the best ones into a follow-up post.
Comments