Science

Anthropic Claude finds dozens of Firefox vulnerabilities

Mozilla patches most issues in Firefox 148, bug discovery scales faster than patch capacity

Images

Image Credits:Anthropic Image Credits:Anthropic techcrunch.com
Russell Brandom Russell Brandom techcrunch.com
Popular AI virtual assistant apps on an Apple iPhone: ChatGPT, Claude, Gemini, Copilot, Perplexity, and Poe. Popular AI virtual assistant apps on an Apple iPhone: ChatGPT, Claude, Gemini, Copilot, Perplexity, and Poe. techcrunch.com
techcrunch.com
Jensen Huang holds up chip at CES 2025 Jensen Huang holds up chip at CES 2025 techcrunch.com

Anthropic says its Claude Opus 4.6 model found 22 vulnerabilities in Firefox in two weeks, including 14 rated “high-severity,” in a security partnership with Mozilla, according to TechCrunch. Most fixes shipped in Firefox 148 in February, with a handful deferred to the next release.

The headline number flatters the story, but the operational shift is more important: discovery is getting cheaper and faster than remediation. Anthropic started in Firefox’s JavaScript engine and then widened the search across the codebase, deliberately choosing Firefox because it is both large and heavily scrutinised. In other words, the test case was not a neglected hobby project; it was one of the internet’s most attacked and most audited open-source applications. If an automated workflow can still surface dozens of issues there, the bottleneck moves downstream to triage, patch review, regression testing, and release management.

That asymmetry shows up inside Anthropic’s own results. Claude was far better at identifying potential vulnerabilities than producing working exploits. The team spent about $4,000 in API credits trying to generate proof-of-concept exploits and succeeded in only two cases, TechCrunch reports. For defenders, that sounds reassuring: the model can point to cracks without reliably weaponising them. For maintainers, it is a different kind of pressure. A tool that produces many plausible findings—some urgent, some wrong, some duplicates—can flood the same limited pool of humans who already do security triage on nights and weekends.

Open-source security has always depended on a mismatch between the number of people who could report bugs and the number who can responsibly fix them. Lowering the cost of “bug discovery” shifts that mismatch from occasional spikes to a steady stream. It also changes incentives. If a project becomes known as “AI-scan friendly,” it may attract more automated reports, more noise, and more demands for rapid disclosure timelines—without any corresponding increase in patch capacity. Conversely, organisations running such tools may be tempted to treat security as a procurement line item: buy more scanning, assume risk goes down.

The attacker side of the ledger does not need the same workflow to benefit. Even if Claude struggled to write exploits, the same class of models can still automate code comprehension, identify suspicious patterns, and narrow the search space for humans who do exploit development for a living. When discovery scales and exploitation partly scales, the limiting factor becomes how quickly software teams can ship safe fixes without breaking everything else.

Firefox 148 absorbed most of the patches. The next question for maintainers is whether the two-week scan becomes a monthly routine—and what happens when the reports arrive faster than the releases.