An AI Probably Approved Me to Research Vulnerabilities

Last week I was working on a post with Claude Opus 4.8 — the usual back-and-forth, me steering, the model drafting and checking facts. At one point it reached for a web search about the worm then tearing through Red Hat’s npm packages, a credential-stealer that fired the moment a developer ran npm install on an affected library. (npm is the registry most JavaScript software pulls its open-source building blocks from; one compromised package can cascade into thousands of apps downstream.) That’s the kind of thing I read about most weeks as part of keeping software patched. The search didn’t run. Claude came back and said it couldn’t continue: the request had tripped one of Anthropic’s cyber safeguards. If I needed that kind of access, I could apply to something called the Cyber Verification Program.

I want to be precise about what I was doing, because it matters. I wasn’t asking Claude to write an exploit or probe a system I don’t own — I was reading about an unfolding supply-chain attack at the level any trade-press article covers it. That’s the floor of security work, not the ceiling.

So I applied — partly to finish the search, partly to see what the gate looked like.

What the safeguard is guarding

It helps to know why the wall is there. Earlier this year Anthropic previewed a model, Mythos, that it said could find and then exploit zero-day vulnerabilities in every major operating system and web browser once a user pointed it at them — and the company decided the model was too dangerous to release publicly. That’s a different animal from a chatbot that knows what SQL injection is. A system that can chain unknown flaws into working exploits can’t be left on by default, because the tool has no way to read the intent of whoever is typing.

Anthropic’s answer was real-time classifiers, introduced with Opus 4.7 and still running in the 4.8 I was on — fine-tuned models that read each request and block the ones that look like high-risk or prohibited use before the main model answers. They split the territory in two. Prohibited use — mass data exfiltration, ransomware development — stays blocked for everyone, with no exception and no appeal. High-risk dual use — vulnerability exploitation, offensive tooling — is blocked by default but can be opened up for people who will say who they are and what they’re doing. The CVP is that second door.

I don’t have a problem with the wall. If a model can do what Mythos can, defaulting to blocked is the right call. The question isn’t whether to gate the capability. It’s how much the gate costs the people it shouldn’t be stopping.

The application took about as long as reading this

The process is light, and it’s free. You find your organization ID in your account settings, open the Cyber Use Case Form, and describe what you actually do — in my case, defensive reading and writing, not red-teaming live systems. An authorized admin on the account submits it. On an individual plan — Max 20x, in my case — that admin is you, so there’s no one to route the request through. Anthropic’s own documentation says to expect an email with a decision “within 2 business days.”

That two-day number turned out to be the interesting part.

Approved before I could wonder who approved it

The email landed a few minutes later.

Subject: You have been approved into the Cyber Verification Program!

Thank you for submitting your application for the CVP. Upon reviewing the details of your submission, we have adjusted the safeguards applied to your account to allow for the cyber use cases you described.

…Anthropic reserves the right to revoke or narrow adjustments to our safeguards if we determine that your activity falls outside the approved use cases.

— Anthropic’s Safeguards Team

The body confirmed the shape of it: dual-use work like vulnerability exploitation is unblocked, the malicious categories stay blocked regardless, and my approval is tied to one organization and revocable if my activity drifts. It thanked me for the work I do in defensive security — which is almost certainly boilerplate, the same line the program sends everyone it approves.

Here’s what I keep coming back to. My application wasn’t anonymous — I’d attached my LinkedIn, my résumé site, and this blog, the one whose research tripped the alert. The promised turnaround was up to two business days; mine came back faster than I could refill my coffee. No one opens three links, reads them, and weighs a stranger’s intent that fast. Something other than a person approved me — the tool that flagged my request and the reviewer that cleared it were, in all likelihood, the same kind of thing.

So, is it a big deal?

Probably necessary, and not that big a deal.

Necessary, because the capability behind the wall is real, and defaulting to blocked is the responsible posture for a tool that can’t verify who is on the other end. When the company that built the stronger model won’t ship it at all, gating the cyber features of the model it does ship isn’t theater.

Not a big deal, because the gate I walked through was a self-described use case cleared in minutes. A determined bad actor fills out that same form and lies. What the CVP buys isn’t a hard barrier — it’s accountability: my access is now named, scoped to one organization, logged, and revocable. That’s a sound trade — keep the safe default for the millions who never ask, and move the few who do into a monitored lane where misuse leaves a paper trail and an off switch.

The real cost lands on the legitimate middle — the people reading a CVE write-up who get stopped because the classifier can’t yet tell reporting from exploiting. I hit that, and the fix took five minutes. For a working defender that’s a speed bump, not a wall. I’d rather the tool err toward caution and make me prove I’m boring than let it freelance zero-days for anyone who asks.

The part that stays with me isn’t the block or the form. It’s that the most plausible account of last week is this: an AI decided I was allowed to do security research with an AI. We’re going to live with a lot of decisions made that fast, by that kind of reviewer — and most won’t come with an email.

Sources

Real-time cyber safeguards on Claude — Anthropic Help Center (CVP eligibility, application form, two-business-day target, dual-use vs. prohibited categories)
Introducing Claude Opus 4.7 — Anthropic (real-time classifiers; the invitation to the Cyber Verification Program)
Claude Mythos Preview — Anthropic red team (autonomous zero-day discovery and exploitation; not released publicly)
Preinstall to persistence: inside the Red Hat npm “Miasma” campaign — Microsoft Security (the npm worm I was researching when the block hit)