NIST Proves It: AI Security Is Never "One and Done"

Can we ever make AI completely impervious to attackers? This week, NIST answered that question with mathematical certainty: no.

On June 9, NIST announced that senior scientist Apostol Vassilev published a peer-reviewed mathematical proof in IEEE Security and Privacy showing that no finite set of guardrails placed on an AI system is universally robust against adversarial prompts. The proof extends the logic of Kurt Gödel's famous 1931 incompleteness theorems — which showed there are limits to what can be proved within any system built on a finite number of rules — to the guardrails that govern AI behavior.

The implication is blunt. As Vassilev puts it: "You can never make a claim that you are robust against all adversarial prompt attacks. There will always be some prompt that can potentially evade and defeat any defensive infrastructure that you have built around your AI system."

Why guardrails can't be airtight

AI vendors build constraints to stop their models from generating prohibited content — malware, deepfakes, instructions for weapons. But attackers craft prompts that cause models to bypass their own refusal mechanisms ("jailbreaking"), leading to real-world risks like cyberattacks, data breaches, and highly personalized phishing.

The root problem, Vassilev notes, is that AI systems take human language as input. The richness and ambiguity of language make compliance-checking built on a finite set of rules infinitely ambiguous — the number of ways to hide harmful intent in plain sight is effectively limitless.

The good news: this is a solvable economics problem

The proof is not a counsel of despair. It provides no recipe for attackers, and it leaves plenty of room to harden AI systems until they are not easy to exploit. The goal, Vassilev argues, is to force attackers into hunting for the equivalent of zero-day exploits — historically so expensive that it often took nation-state resources — until "the cost of finding new exploits exceeds attackers' resources."

His prescription has three elements:

  1. Continuous red-teaming — constant work to uncover new adversarial prompts before real attackers do.
  2. Continuous updates — hardening AI guardrails against newly discovered adversarial prompts as they're found.
  3. Operational resilience — limiting impact and recovering quickly when, not if, an exploit occurs.

In other words: retire the "one and done" security model. Deploy-and-forget is now provably insufficient — the only defensible posture is continuous monitor-and-update.

Where agentic AI fits in

Patriot was recently recognized by Microsoft as an Engaged Partner for MDASH, Microsoft's Multi-Model Agentic Scanning Harness — an agentic AI security system that orchestrates more than 100 specialized AI agents to discover, validate, and prove the exploitability of software vulnerabilities. Microsoft frames MDASH as part of a broader shift "from reactive detection to proactive identification of exploitable risk" — which is precisely the transition NIST's proof says every organization deploying AI must now make. NIST proved the treadmill exists; agentic systems like MDASH are how defenders run on it faster than attackers.

Microsoft MDASH — Multi-Model Agentic Scanning Harness

Image: Microsoft Security Blog

And for organizations adopting Copilot, Claude, ChatGPT, and other generative AI tools, this needs to be included in all AI Governance practices: use tools like MDASH to scan your code, ideally incorporate it as part of your CI/CD software development lifecycle to review all pull requests.

Sources