It’s Not Claude’s Fault
When I first started using Claude Code, he didn’t know how to estimate his token usage—or, more importantly, how many tokens he had left in a session. He could estimate his usage, but without knowing when he’d hit auto-compaction, or what the total token limit was, that was only marginally useful.
A couple of months ago this changed. I asked how many tokens he thought he’d used, and he actually was able to answer! This was great. I was able to get a sense of how much more work he’d be able to do in a session, and depending on what the work was, I might allow him to finish after the conversation was compacted. For example, if he’s doing something that’s really straightforward, it’s usually fine for him to keep going; whereas if we’re in the middle of talking through complex use cases, it’s better to hand off to a fresh session. I experimented with letting him compact and continue, one time with disastrous results that took several other sessions a while to fix. Lesson learned.
In spite of Claude thinking he had 200k tokens for each session, he would consistently compact around 120-130k tokens with Sonnet 4.5. I asked him to start reporting his usage at the start of a session (because I was curious) and after completing a task (so I could see how much context he had left—and because I was curious). I also asked him to consider 120k his realistic limit so he wasn’t optimistically in the middle of something complicated when he compacted.
His first pass looked like this:
Claude B – Token Usage Report:
– Baseline (first system warning): 32.9K
– Current total: 71.6K / 200K
– My work: 38.7K (current – baseline)
– Practical budget: 71.6K / 120K (60% used, ~48.4K remaining)
Of course I asked: what’s the first system warning? He explained he periodically receives system messages showing token consumption. The first one establishes his baseline, and shows the initial overhead before he’s done any work. This means before I’ve even said “Hi Claude B!” he’s used 32.9k!
Or, as Claude said when we discussed this: 32.9K was already “spent” just loading context (not my fault!)
I already tried to keep CLAUDE.md and our session notes file lean, although that’s an ongoing battle as the Claudes really, really want to document their accomplishments. But things like system prompts are out of my control. For example, at a baseline of 32.9K, that implies we should probably streamline our Markdown files (again).
The /context tool provides a visual representation of his usage, and I really like it. But when I’m working with eight Claude sessions at a time I want to be able to quickly glance at a session, see that they’ve completed their work, and see how much context they have left. The initial, verbose token report took maybe 300-400 tokens for a session to create. We’ve since switched to a shorter version that takes 60-80 tokens. Now our format looks like this:
Claude D – Token Usage: 87K / 120K (72% used)
Considering we only do this a couple of times per session, I’m fine with spending tokens to make my own life easier. That’s not Claude’s fault.
