One Million Tokens

Anthropic announced Claude Opus 4.6 on February 5th. Among the headline features: a context window that could stretch to 1 million tokens — five times the previous 200,000 token limit. At launch, the extended context was in beta, carried a pricing premium, and required a special header to activate via the API.

Today, March 13th, Anthropic flipped the switch. The 1 million token context window is now generally available for both Opus 4.6 and Sonnet 4.6. No beta header. For API users, the 2x pricing surcharge that applied to requests over 200K tokens is gone. For Max subscribers like me, the context window just got five times bigger — though a session that fills it will obviously take a much larger bite out of your usage quota than a short one. For Max and Team subscribers using Claude Code, the upgrade is automatic.

Catching It in the Wild

People on Reddit and Threads have been posting screenshots over the past few days — their accounts suddenly showing the 1M context indicator where 200K used to be. It’s been rolling out gradually, and watching people discover it in real time has been its own kind of entertainment.

I launched quite a few Claude Code sessions today before it happened to me. The session that did it:

claude --model claude-opus-4-6 --effort max

That’s when my terminal showed Opus 4.6 (1M context) for the first time. Whether it was the specific flags or just timing, I can’t say — but there it was.

What a Million Tokens Actually Means

A token is roughly three-quarters of a word. So a million tokens is about 750,000 words — somewhere north of War and Peace and The Lord of the Rings combined. In code terms, it’s an entire large codebase held in memory at once.

With the old 200K window, long Claude Code sessions hit a wall regularly. The tool would trigger what Anthropic calls compaction — summarizing the conversation to free up space. If you’ve used Claude Code for any serious work, you know the feeling. You’re deep into a complex refactor, the context fills up, and suddenly the model needs a recap of what you’ve been doing for the last hour. Important details slip through. You re-explain things. Momentum breaks.

Compaction typically kicked in when the context hit 75-92% capacity. With 200K tokens, that meant roughly 150-180K tokens before the squeeze. With a million tokens, that threshold moves to somewhere around 750-900K. For most coding sessions — even long, ambitious ones — you may never hit it.

The Practical Difference

This isn’t just a bigger number on a spec sheet. It changes how you work.

With 200K tokens, I’d sometimes compact two or three times in a single extended session. Each compaction was a small reset — the model retained a summary, but nuance got lost. Specific instructions from early in the conversation would vanish. The model would forget that I’d asked it to use a particular approach, or that we’d already tried and rejected an alternative.

With a million tokens, an entire afternoon of work can live in one continuous context. The model remembers what you said two hours ago because it’s still right there in the window, not compressed into a paragraph-long summary. For the kind of work I do — managing notes from back-to-back recurring meetings with long histories, building lengthy PowerShell scripts, writing and deploying internal apps — that continuity matters.

It also matters for architecture work. Loading an entire codebase into context and asking questions about how components interact has always been Claude Code’s strength. Quintupling the available space means larger projects fit without the model having to swap things in and out.

The Competitive Landscape

For context: Google’s Gemini 3 Pro offers a 10 million token window. OpenAI’s GPT-4.1 matches Claude at 1 million. The raw numbers don’t tell the whole story, though — what matters is how well the model actually uses that context. A model that can hold a million tokens but loses track of details at 500K isn’t meaningfully better than one with a smaller window that maintains precision throughout. Anthropic claims Opus 4.6 scores 76% on their multi-needle retrieval benchmark at the full million tokens. That’s the number that matters more than the window size itself.

The Cost Question

There’s a catch, and it’s worth being honest about. Opus 4.6 is hungry. Users on GitHub and Reddit have reported the model consuming 3-9x more tokens than its predecessor for equivalent tasks. Anthropic enabled adaptive thinking by default, which means the model spends more tokens reasoning before it responds. That’s part of why it’s better — but it also means Max plan subscribers have been burning through quotas faster than expected.

Eliminating the long-context pricing premium helps. But if you’re on a metered plan, a model that thinks harder and has five times more room to do it in will cost more. That’s the tradeoff.

What It Feels Like

The honest answer is: it feels like the guardrails moved back. Like working in a room that just got five times bigger. You stop thinking about conservation — stop wondering whether this next prompt is the one that triggers compaction — and start thinking about the actual problem.

That’s a small thing in any individual moment. But compounded across a full day of coding, it adds up to a meaningfully different experience. Less friction. Fewer resets. More of the conversation that actually matters, still available when you need it.

I don’t know if the specific flags I used had anything to do with when the 1M context appeared on my account. It might have been pure coincidence. But I’ll take it. A million tokens is a lot of room to think.

Catching It in the Wild

What a Million Tokens Actually Means

The Practical Difference

The Competitive Landscape

The Cost Question

What It Feels Like

Security Scorecard