When the Frontier AI Models Become the Adversary

By Whitney Anderson

On April 7, Anthropic did something unusual for a company in the business of selling AI capabilities: it refused to sell one.

Claude Mythos Preview, the company's most advanced model, will not be made available to the general public. Instead, Anthropic restricted access to roughly forty organizations through a program called Project Glasswing, limited strictly to defensive cybersecurity work. The reason is straightforward and, if you work in financial services, should be deeply unsettling: Mythos is so capable at discovering and exploiting software vulnerabilities that Anthropic concluded releasing it broadly would do more harm than good.

In internal evaluations, Mythos found zero-day vulnerabilities across every major operating system and browser. It discovered a 27-year-old bug in OpenBSD that had survived decades of expert human auditing. Non-experts, people with no formal security training, used it to develop working exploits overnight. Engineers on Anthropic's red team tasked the model with finding vulnerabilities before leaving for the evening; by morning, they had functional exploit chains waiting in their terminals.

The chaining capability is what makes this relevant beyond the security community. In one documented case, Mythos identified four separate browser vulnerabilities, none individually critical, and autonomously combined them into a complete sandbox escape. Each vulnerability, in isolation, would have been classified as low- or moderate-severity. Together, sequenced precisely, they constituted a full system compromise.

Anthropic's own assessment was blunt: "The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them."

The technology and cybersecurity press have covered this extensively. The conversation has been almost entirely about software security, on zero-days, exploit chains, and the implications for critical infrastructure—and it is incomplete. 

Mythos demonstrated autonomous multi-step reasoning that chains individually minor weaknesses into catastrophic outcomes, and it does not apply only to software. It applies, with equal force and arguably greater consequence, to the financial system, and almost no one is talking about it.

Why the Software Analogy Breaks Down

Cybersecurity and financial crime look similar from the outside, both involving adversaries probing complex systems to find weaknesses. However, they are not the same, and treating them as equivalent is part of why the financial system is underprepared.

The mental model most people use for cybersecurity vulnerabilities doesn't translate directly to financial crime. Software has bugs, unintended behaviors that can be discovered and exploited. Financial systems don't have bugs in the same sense. They have rules. Thresholds. Policies. Behavioral assumptions. And every one of them can be reverse-engineered by a sufficiently capable model.

Mythos didn't brute-force its way through code looking for buffer overflows. It reasoned about the system. It developed hypotheses about how components interacted. It probed boundaries, observed responses, and inferred the internal logic that produced those responses. Then it identified the gaps between what the system's designers intended and what the system actually permitted.

A bank's fraud detection infrastructure operates similarly. The majority of systems in production at financial institutions today are still fundamentally rules-based and operate on fixed thresholds. Transactions above a certain dollar amount get flagged. Velocity rules trigger when too many transactions occur in a defined window. Geographic restrictions block transactions from sanctioned jurisdictions. Currency Transaction Reports are filed at $10,000. These rules are not secret. Many are publicly documented in regulations. Others can be inferred through systematic observation.

A Mythos-class model doesn't need to see the source code of a bank's fraud detection system. It needs to observe which transactions pass and which get flagged. A series of low-value test transactions, systematically varied in amount, timing, geography, and payment type, would reveal the exact contours of the detection logic, not in weeks or months, but in hours. The same probing-and-inference approach that lets the model discover a 27-year-old operating system bug also reveals that a particular institution flags wire transfers above $4,500 to new payees but ignores peer-to-peer transfers below $2,000 regardless of velocity.

The vulnerability is structural: the rules are deterministic, and a reasoning model can determine them.

The Chaining Problem

What Mythos introduced was something the financial system has no equivalent answer to: chaining.

While impressive, individual vulnerability discovery is a capability that existed in prior models. The breakthrough was autonomous sequencing, combining four weaknesses that were individually non-critical into a single exploit path that achieved an outcome none of them could achieve alone. 

Financial crime has run this same playbook for years. Mythos helps illustrate why it matters, and why it's so hard to stop. 

Consider the following sequence of events:

A new account is opened at a consumer bank. The identity documents pass KYC verification. The applicant's selfie matches the photo ID. There is no adverse media, no watchlist hit, no derogatory credit history. The account is approved. 

A small direct deposit arrives in the account — $1,200, consistent with a biweekly paycheck. Then a few purchases: groceries, a streaming subscription, a gas station fill-up. Normal consumer activity. The transaction monitoring system begins building a behavioral baseline. The account looks healthy.

A few weeks later, the account holder begins making peer-to-peer payments to other individuals. The amounts are modest, with $200 here, $350 there. Each payment is well below any reporting threshold. Each recipient is a real, verified account at another institution. Nothing about any individual payment raises a flag.

Over the next month, the pattern continues. Small deposits in, slightly smaller amounts out via P2P. The account maintains a modest balance. It looks like a working person sharing expenses with friends, paying a landlord, and splitting bills. Every transaction, individually, is boring.

Now multiply this by 300 accounts. All opened within the same three-week window. All follow the same behavioral playbook, with enough variation to avoid pattern matching. All routing funds through a layered network of P2P transfers that, traced to their endpoints, converge on a handful of accounts that convert to cryptocurrency and exit the banking system entirely.

No single account triggered an alert. No single transaction was suspicious. The fraud exists only in the coordination, in the chain, and no rules-based system deployed at any of the involved institutions was designed to see it.

This is the financial crime equivalent of Mythos's sandbox escape: individually innocuous steps, precisely sequenced, producing an outcome that no individual control was designed to prevent.

The Speed Problem

Anthropic's engineers tasked the model with finding vulnerabilities and went home for the night. By morning, they had working exploits. The model didn't need sleep, didn't need to context-switch, didn't need to wait for a colleague's input. It executed continuously and autonomously at machine speed.

At human speed, that operation, with 300 synthetic accounts, weeks of behavioral seasoning, and coordinated extraction, is already difficult to detect. At machine speed, an agentic system could manage thousands of accounts simultaneously. It could monitor each institution's responses to its probe transactions in real time. It could observe a blocked transaction on one account, hypothesize the detection rule that triggered it, adjust the parameters across all other accounts, and continue all before a human analyst has finished reading the first alert in their queue. The attacker learns faster than the defender. And it operates at a speed that the defender's architecture (which still assumes human analysts as the bottleneck) cannot match, because your model is operating on a now-outdated training run.

Today's fraud detection systems are static between retraining cycles, which typically occur quarterly or, at best, monthly. A reasoning model adapts continuously. Every blocked transaction is a data point. Every cleared transaction is a confirmation. The adversary's model of your defenses gets more accurate with every interaction, while your model of the adversary's tactics is frozen in your last training run.

We Called This

None of this is a surprise to anyone who has been paying attention.

In September 2024, at the NYC RTP Conference, I laid out a projection that by the end of the decade, we would face a world with a billion autonomous AI agents operating in commercial systems, including financial systems. The audience was skeptical. A billion agents sounded like science fiction. With Mythos, it doesn’t anymore.

The capabilities that Anthropic felt compelled to restrict are not the product of specialized training for exploit development. They were not fine-tuned on vulnerability databases or hacking toolkits. They emerged (Anthropic's word)  from general improvements in reasoning and autonomy. The model improved its thinking, and as a result, it became better at identifying and exploiting weaknesses in complex systems.

The critical insight? Capability gains in attack rise in tandem with capability gains in reasoning. Every major AI lab in the world is racing to build models that reason better about complex systems. That work produces better medical diagnostics, better legal research, better financial modeling, but it also produces better adversaries. The same improvements power it all.

The billion-agent world isn't a 2030 problem. The capability foundation is here today, restricted only by a single company's voluntary decision to withhold it from general release. That decision is admirable. It is also not a durable defense strategy. Anthropic is not the only organization building frontier models. And the next lab to achieve Mythos-class capabilities may not exercise the same restraint.

The Defender's Dilemma

In Anthropic's framing, there is a "defender's advantage" in cybersecurity: the same model that finds vulnerabilities can patch them. The security community can use Mythos to find and fix bugs before adversaries discover them on their own. It's a reasonable argument for software, where vulnerabilities are finite, patchable, and confirmable.

Financial crime has no equivalent defender's advantage, at least not one that exists in current institutional architectures.

Software vulnerabilities can be patched. A buffer overflow, once identified, can be fixed in code, deployed via an update, and permanently eliminated. Financial system "vulnerabilities" such as the rules, thresholds, and behavioral assumptions I described earlier are not bugs to be patched. They are fundamental features of how the system operates. You cannot "patch" the $10,000 CTR threshold. You cannot eliminate the need for velocity rules. You cannot make behavioral baselines unnecessary. These are structural elements of financial regulation and risk management. What you can do is stop relying on them as your primary detection layer.

A model that can reverse-engineer deterministic rules can always defeat deterministic rules. That is a mathematical certainty and a property of deterministic systems, not a prediction. The only viable defense against a reasoning adversary is a reasoning defense and is one that operates on entity-level intelligence rather than transaction-level rules, that evaluates behavioral patterns rather than threshold breaches, and that adapts in real time rather than in quarterly retraining cycles.

This means connecting the full entity graph: the relationship between a person, their accounts, their devices, their behavioral patterns, and their network of connections. It means evaluating risk continuously across every interaction, not at point-in-time checkpoints. It means building defensive systems that learn from every transaction at the same speed as the adversary, and doing all of this on a unified data and intelligence layer. 

The adversary sees your institution as a single target. If your defenses are fragmented across six vendor systems that sync once a day, you have given the attacker a coordination advantage.

The 18-Month Window

I don't know exactly when Mythos-class capabilities will be available to adversaries. Neither does Anthropic. What I do know is that the trajectory is clear and the timeline is short.

Anthropic restricted Mythos out of genuine concern for safety. But the capabilities Mythos demonstrated are emergent properties of scale and reasoning improvements that every major AI lab is pursuing independently. DeepSeek, Google, Meta, OpenAI, and a dozen well-funded startups are all climbing the same capability curve. The specific techniques differ. The destination is the same.

Every financial institution should assume that Mythos-class reasoning capabilities will be available to adversaries, not just defenders, within 12 to 18 months. That's not pessimism. That's the observed rate of capability diffusion in the foundation model ecosystem.

Anthropic just showerd everyone what this looks like in practice: a model that chains minor weaknesses into catastrophic exploits, runs autonomously through the night, and hands non-experts working sophisticated attacks by morning. Whether your institutions defenses will be ready when that same capability is aimed at your fraud management tools, your KYC stack and your transaction monitoring rules is the question worth sitting with.

The Imperative

Anthropic's announcement ended with a line that I want to borrow, because it applies to financial crime as precisely as it applies to cybersecurity: "The advantage will belong to the side that can get the most out of these tools."

For two decades, the financial services industry's defense architecture has been designed around a threat model of human adversaries operating at human speed. That threat model is obsolete. The adversary is becoming autonomous, adaptive, and capable of the kind of multi-step reasoning that chains innocuous behaviors into undetectable fraud at scale.

The institutions that recognize the shift and move from rules to behavioral intelligence, from point solutions to unified platforms, from static models to adaptive systems, from transaction-level detection to entity-level understanding, are well-positioned for what’s coming. The ones that file Mythos under ‘cybersecurity, not our problem,’ are making an expensive assumption about how much time they have. And by the time that assumption is tested, the cost of that lesson will be measured in billions.

What Anthropic documented was a reasoning engine, pointed at a complex system, inferring its rules and turning its own logic against it. Your fraud detection infrastructure has rules. It has thresholds. It has assumptions built in by people who never imagined a tool that could read them this fast. The model has become the adversary. The only variable is whether your defenses are built to meet it.

Table of Contents

You might be interested in…

Get Started Today

Experience how FraudNet can help you reduce fraud, stay compliant, and protect your business and bottom line

Recognized as an Industry Leader by