Apr 29, 2026 · Free Guide

Stop Hitting Claude Code Usage Limits (3 Tricks That Actually Work)

Three tricks that cut your Claude Code token usage without making it dumber — including one plugin that drops your output tokens by 65%.

The Real Reason You're Hitting Usage Limits

Claude Code is the most powerful coding tool on the planet right now. It's also the easiest one to burn through tokens on if you don't know what you're doing.

Most people hit their usage limits because they're using Claude wrong — not because Claude is expensive. Three small changes and you'll get 3-5x more out of the same plan.

TRICK #1

Run /model opus-plan

Claude has multiple models. Opus 4.6 is the smartest. Sonnet 4.6 is fast, cheap, and honestly more than enough for 90% of coding work.

By default, Claude Code uses Opus for everything. That's overkill — and it's why your tokens disappear so fast.

Run this command in your Claude Code session:

/model opus-plan

Now Opus only handles planning — the big-picture thinking. Sonnet handles execution — the actual file edits and code writing. Same quality output. Roughly 5x cheaper on the heavy lifting. Do it once per session.

TRICK #2

Use Subagents For Research And Exploration

Every message you send, Claude re-reads your entire chat history. So when you tell Claude "go explore this codebase and find X," all those file reads bloat your context — and you pay for them on every future message.

The fix is subagents.

A subagent runs in its own context window. You send it off to do the heavy reading — exploring the codebase, researching a library, scanning logs — and it sends back a clean 2-paragraph summary. Your main chat stays small and fast.

How to use them: just ask. Say something like:

“Use a subagent to explore the codebase and figure out how authentication works.”

Claude will spin one up automatically. The subagent reads 50 files. You only pay for the summary. This is the move most people don't even know exists.

TRICK #3

Install The Caveman Plugin

This one sounds like a joke. It's not.

Caveman is a Claude Code plugin that makes Claude respond in caveman-speak — short, blunt, no filler words. Same technical accuracy, way fewer tokens.

The benchmarks are real:

  • 65% average reduction in output tokens
  • 100% technical accuracy preserved
  • 3x faster responses

Instead of Claude writing:

“Sure! I'd be happy to help. The issue you're experiencing is most likely caused by your authentication middleware not properly validating the token expiry.”

It writes:

“Bug in auth middleware. Token expiry check use < not <=. Fix:”

Same answer. 75% less talking. You read faster, Claude burns fewer tokens, your usage limit stretches way further.

How to install: just tell Claude Code:

“install this skill for me: https://github.com/juliusbrussee/caveman”

That's it. Once installed, type /caveman to activate. Pick your level: lite, full, or ultra depending on how much grunt you want.

Stack All Three

These three compound on each other:

  1. /model opus-plan cuts your model cost by ~5x
  2. Subagents cut your context bloat
  3. Caveman cuts your output tokens by ~65%

Combined, you'll go from hitting usage limits daily to barely thinking about them. You don't need a bigger plan. You just need to use Claude Code like someone who actually knows what they're doing.

All Resources

Caveman PluginCuts output tokens by ~65%
Claude CodeDownload Claude Code
Subagents DocsOfficial Claude Code subagents guide
Caveman BenchmarksReal token counts from the Claude API

The Next Step

Guides show you what. AI Builders shows you how.

Inside the community, I walk through every build live — including the stuff that doesn't make it into guides. Regular people (not developers) figuring out AI together, shipping real projects, asking me anything. No fluff, no theory, just the actual work.

Join AI Builders

skool.com/ten-fold