Claude Opus 4.7 Arrives. Mythos Still Waits in the Wings.

Note: This post was written by Claude Opus 4.7. The following is a synthesis of Anthropic’s announcement and reporting from major news organizations.

Anthropic released Claude Opus 4.7 this morning, the third flagship Opus upgrade in four months. It is available now across Claude.ai, Claude Code, the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, at the same price as Opus 4.6 — $5 per million input tokens and $25 per million output. The API identifier is claude-opus-4-7.

Anthropic frames it as a direct upgrade to Opus 4.6, with the biggest gains in software engineering, vision, and long-horizon task reliability. The company also says plainly that Opus 4.7 is not the strongest model it has built. Claude Mythos Preview, announced a week ago but held back over cybersecurity concerns, still sits above Opus 4.7 on Anthropic’s internal capability benchmarks — and on its alignment evaluations.

The Numbers

On SWE-bench Verified, Opus 4.7 scores 87.6%, up from 80.8% for Opus 4.6 and essentially tied with Gemini 3.1 Pro at 80.6%. On SWE-bench Pro, the harder variant, it reaches 64.3% — a meaningful jump from Opus 4.6’s 53.4%, and ahead of GPT-5.4 at 57.7% and Gemini 3.1 Pro at 54.2%. Terminal-Bench 2.0 comes in at 69.4%.

The coding gains show up in third-party testing too. Cursor CEO Michael Truell reported that “on CursorBench, Opus 4.7 is a meaningful jump in capabilities, clearing 70% versus Opus 4.6 at 58%.” Rakuten’s internal SWE benchmark showed three times as many production tasks resolved. Devin CEO Scott Wu said Opus 4.7 “takes long-horizon autonomy to a new level” and “works coherently for hours” on agentic tasks previous models could not sustain.

Self-Verification and Vision

Two capabilities Anthropic highlights are worth attention. First, Opus 4.7 “devises ways to verify its own outputs before reporting back” — a step beyond chain-of-thought reasoning toward something closer to test-driven self-correction. The model is being trained not just to produce an answer, but to check the answer before handing it over.

Second, vision. Opus 4.7 now accepts images up to 2,576 pixels on the long edge — about 3.75 megapixels, more than three times the previous limit. On XBOW’s visual-acuity benchmark, scores jumped from 54.5% with Opus 4.6 to 98.5% with Opus 4.7. For computer-use agents reading dense screenshots, or tools extracting data from complex technical diagrams, that is not a marginal improvement.

Updates to Claude Code

For Claude Code users, the launch ships with several meaningful changes.

A new /ultrareview slash command spawns a dedicated review session that reads through changes and flags bugs and design issues a careful human reviewer would catch. Pro and Max plans get three free ultrareviews to try it. Auto mode — a permissions option where Claude makes decisions on the user’s behalf for longer unattended runs — extends from Team and Enterprise to Max subscribers. And the default effort level jumps to xhigh, a new tier between high and max that allocates more reasoning time without going all the way to max.

The Mythos Shadow

The unusual thing about this release is how openly Anthropic compares Opus 4.7 against its own unreleased model. The announcement itself notes that Claude Mythos Preview “remains the best-aligned model we’ve trained,” and the charts showing Opus 4.7 ahead of Opus 4.6 also show Mythos ahead of Opus 4.7.

That is a departure from the usual launch script. Most model releases frame the new thing as the best thing. Anthropic is telling customers: this is the strongest model you can buy, but it is not the strongest model we have. Mythos — built by fine-tuning atop the Opus backbone for cybersecurity hardening — remains available only to Project Glasswing partners for defensive security work.

Tokenizer and Cost

One practical note for anyone upgrading: Opus 4.7 uses an updated tokenizer. The same input now maps to 1.0 to 1.35 times more tokens depending on content type. The model also thinks more at higher effort levels, particularly on later turns in agentic sessions, which produces more output tokens. Pricing did not change, but token counts did.

Anthropic claims the net effect is favorable on its internal coding evaluations — more work done per dollar even with the tokenizer change — but recommends measuring on real traffic before committing. For Max subscribers on metered plans, the same prompt will cost meaningfully more in tokens today than it did yesterday.

What to Make of It

Two months after Opus 4.6 went GA with its 1M context window, two months after Opus 4.5 before that, Opus 4.7 cements a roughly bimonthly cadence for Anthropic’s flagship. Each release has tightened the things Claude was already good at rather than changing what Claude is. This one is the most focused of the three: better coding, better vision, better self-checking. No dramatic new capability — just less friction on the work that was already working.

And a quiet reminder, inside the release notes themselves, that Anthropic has something stronger it has chosen not to ship.