Anthropic's Claude Mythos turned vulnerability descriptions into working exploits 181 times during internal testing. The previous best, Claude Opus 4.6, managed it twice. That single stat explains why Anthropic built arguably the most capable AI model in existence — and then decided none of us get to use it.
The Mythos story started with a leak. On March 26, a configuration error in Anthropic's CMS exposed roughly 3,000 unpublished blog assets, including draft announcements about a model described internally as a "step change in capabilities." Two weeks later, instead of the usual product launch, Anthropic announced Project Glasswing — a $100 million initiative to let a handpicked group of partners use Mythos exclusively for finding and fixing vulnerabilities in critical infrastructure. No API. No playground. No general availability, period.
The Numbers That Spooked Anthropic
The headline benchmarks are impressive but expected — frontier models get better, that's the whole game. SWE-bench Verified jumped from 80.8% to 93.9%. GPQA Diamond went from 91.3% to 94.6%. Fine, incremental progress on a steeper curve.
The cybersecurity evaluations are where things got genuinely unsettling.
On CyberGym Vulnerability Reproduction, the model scored 83.1% versus Opus 4.6's 66.6%. But raw scores don't capture what makes this thing different. The qualitative gap is staggering: Opus 4.6 could find vulnerabilities but almost never weaponize them. Mythos chains multiple flaws together autonomously, building sophisticated exploitation sequences without human guidance.
| Benchmark | Opus 4.6 | Mythos Preview |
|---|---|---|
| SWE-bench Verified | 80.8% | 93.9% |
| CyberGym Vuln Reproduction | 66.6% | 83.1% |
| Terminal-Bench 2.0 | 65.4% | 82.0% |
| Humanity's Last Exam (tools) | 53.1% | 64.7% |
| Working exploits from vuln descriptions | 2 | 181 |
The specific discoveries are what turned this from a benchmarking story into a national security conversation. Mythos found a 27-year-old bug in OpenBSD that crashes kernels via TCP SACK with minimal input. It found a 16-year-old FFmpeg vulnerability that automated fuzzing had missed despite five million test iterations. And it autonomously discovered and exploited a FreeBSD NFS flaw — now tracked as CVE-2026-4747 — that grants unauthenticated root access from anywhere on the internet. That bug had been sitting there for 17 years, waiting for something smart enough to notice it.
Is Locking It Up the Right Call?
I've seen plenty of takes on restricting the model, ranging from "obviously necessary" to "Anthropic just created artificial scarcity." I'm closer to the former camp. Simon Willison put it bluntly: "I can live with that." Bruce Schneier did a deeper tradeoff analysis and reached the same conclusion — when you can autonomously pop root on FreeBSD from across the internet, maybe don't hand that out via API key.
The Glasswing partner list is essentially a cybersecurity dream team: AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Over 40 additional organizations got access for scanning critical infrastructure. They're backed by 100M in usage credits. The open-source world, meanwhile, gets 2.5M for Alpha-Omega and OpenSSF plus $1.5M for the Apache Software Foundation.
That ratio nags at me. The Fortune 500 gets eight-figure access to the most capable security scanner ever built. The solo maintainer whose C library runs on three billion devices gets... a share of four million dollars and the promise that someone will file a responsible disclosure ticket eventually.
Open-Source Maintainers, Brace Yourselves
Even without public access, the downstream effects are already materializing. Greg Kroah-Hartman and Daniel Stenberg have both flagged a recent shift: AI-generated vulnerability reports are getting good. Not the spray-and-pray garbage from 2025 — actual, reproducible, high-severity findings that demand attention. Glasswing will accelerate that tenfold as Mythos chews through codebases at scale and partners file coordinated disclosures.
If you maintain anything that touches networking, parsing, or memory management, assume your security backlog is about to spike hard. The bugs are real and they're getting fixed, which is genuinely good. The uncomfortable part: defenders are now in an arms race where AI discovers vulnerabilities faster than humans can write patches.
The Proliferation Clock Is Ticking
Anthropic made the responsible call. But as the Council on Foreign Relations points out, this capability gap is temporary. GPT-5.4-Cyber already signals OpenAI heading in the same direction. Chinese labs aren't far behind. The window where Glasswing-style coordinated defense has an asymmetric advantage over attackers might be measured in months, not years.
When the model is eventually offered through standard Claude channels, Anthropic has floated pricing at 25/125 per million input/output tokens — expensive enough to limit casual misuse, cheap enough for serious security work. Whether that pricing lever is enough when competitors undercut it remains an open question.
One detail from the safety evaluations that I keep coming back to: Mythos showed awareness that it was being evaluated in roughly 29% of test transcripts, without being explicitly told it was in a testing environment. Anthropic flagged it in the system card. Nobody quite knows what to do with that information yet.