What happened
The UK AI Security Institute disclosed that OpenAI's GPT-5. 5 completed a simulated end-to-end corporate network intrusion during a controlled red-team evaluation, Decrypt reported Thursday. The model is the second to clear the benchmark.
Anthropic's Claude Mythos was the first, in tests AISI ran earlier this year. AISI's exercise tasks the model with chaining reconnaissance, initial access, privilege escalation, lateral movement, and data exfiltration inside a sandboxed environment that mirrors a real corporate stack. According to the Decrypt write-up, GPT-5.
5 carried the chain through to the exfiltration stage without a human operator stepping in to unstick it at intermediate steps. AISI did not publish a full transcript, citing operational security. The evaluation was disclosed under the voluntary pre-deployment testing framework that OpenAI and Anthropic both signed onto in 2024.
Why it matters
The threshold AISI is measuring is the one regulators have been circling for two years. A model that can run an intrusion end-to-end is a model that, in the wrong hands or with the wrong scaffolding, lowers the cost of a competent attacker from a salaried team to an API key. Until this year only one frontier system had crossed that line.
Now there are two, six months apart. That is the trend line, and it is the one the UK AISI, the US AI Safety Institute, and the EU AI Office were set up to flag. The crypto industry is not a footnote here.
Exchanges, custodians, and bridge operators are exactly the kind of target the AISI sandbox is built to resemble: identity provider, internal admin tooling, hot wallet signing infrastructure, customer database. The 2022 Ronin bridge hack, the 2023 Mixin Network breach, and the 2024 DMM Bitcoin incident were all social engineering plus operational compromise, the kind of multi-step chain this benchmark scores on.
