Back to writing
Why AI Governance Fails at Runtime and What CISOs Must Do Now

Why AI Governance Fails at Runtime and What CISOs Must Do Now

Adam DiStefano·

Most organizations that claim to govern their AI systems are doing something closer to documentation theater. They have policies. They have review boards. They may even have risk registers and model cards. And in most cases, none of those mechanisms are connected to what the AI system actually does at the moment it takes an action.

That is the enforcement gap. The distance between what your governance program says should happen and what actually happens at runtime when an agent invokes a tool, calls an API, accesses a data store, or chains a sequence of decisions across systems you may not fully control.

This post is written for CISOs, security architects, and governance leaders who are responsible for AI risk in their organizations. It explains why the current approach to AI governance is structurally inadequate for agentic systems, why detection and audit alone cannot close the gap, what the Agentic Controls Reference (ACR) Standard provides, and what practical steps you should be taking right now.

The Enforcement Gap and the Cost of Ignoring Runtime Control

The enforcement gap is the central problem in AI governance today. Most governance frameworks operate at the policy layer. They define acceptable use policies, establish review processes, require documentation before deployment, and create accountability structures on paper. All of that work has value. But it stops at the boundary where the system actually executes.

Once an agentic system is running, it is making decisions, invoking tools, accessing resources, and producing operational effects in real time. If your governance architecture has no mechanism to enforce constraints at the point of execution, then your governance is advisory. It is a set of recommendations that the system has no obligation to follow and no mechanism to enforce.

The cost of this gap is growing. Organizations deploying agentic AI into production workflows are discovering that policy-layer governance does not prevent the failures that matter most: unauthorized data access, excessive tool invocations, cascading errors across multi-agent workflows, and actions that violate stated intent. These are execution failures, and they require execution-layer controls.

The financial exposure is significant. A single uncontrolled agent action that accesses the wrong data, triggers the wrong API call, or produces an unauthorized output can create regulatory liability, customer harm, or operational disruption that no amount of post-incident review can undo. The enforcement gap is where liability lives.

Where Current Governance Breaks

To understand why runtime control matters, it helps to look at the specific failure modes that current governance approaches cannot prevent.

Failure Mode 1: Policy Exists, Enforcement Does Not

The most common failure. An organization has a well-written AI acceptable use policy. It defines what models can do, what data they can access, and what approvals are required. But there is no technical mechanism that translates those policy statements into runtime constraints. The policy says "agents must not access PII without approval." The agent accesses PII without approval. The policy was never connected to the execution layer.

Failure Mode 2: Review Happens Before Deployment, Then Stops

Many organizations run model risk assessments, red team exercises, or architecture reviews before deploying an AI system. This is valuable work, and it should continue. The problem is that it produces a point-in-time evaluation of a system that will change its behavior continuously after deployment. Agentic systems adapt their execution paths based on context, memory, and tool availability. A pre-deployment review cannot account for the runtime behaviors that emerge after the system is operational.

Failure Mode 3: Monitoring Without Authority to Act

Some organizations have invested in observability for their AI systems. They log model calls, track token usage, and monitor for anomalies. This is better than nothing, and it is still insufficient. Monitoring tells you what happened. It does not prevent what is about to happen. If your monitoring system detects that an agent is about to execute an unauthorized action, and you have no mechanism to halt that action before it completes, then your monitoring is a forensic tool. It is useful for incident investigation. It is not a control.

Failure Mode 4: Human Oversight That Cannot Scale

The phrase "human in the loop" appears in nearly every AI governance framework. It is rarely defined with enough specificity to be enforceable. Who is the human? What are they authorized to approve? What happens if they are unavailable? What is the latency budget for human review in a system that executes actions in milliseconds? Without clear answers to these questions, human oversight becomes a checkbox on a compliance form rather than an operational control.

Failure Mode 5: Multi-Agent Systems With No Authority Model

As organizations move from single-agent to multi-agent architectures, the governance problem compounds. Agent A delegates a task to Agent B. Agent B invokes a tool that Agent A was never authorized to use. Who authorized that action? What authority did Agent B have? Was the delegation valid? Without a formal authority model that governs how agents grant, receive, and constrain delegated authority, multi-agent systems create accountability gaps that grow with every new agent in the workflow.

Why Detection and Audits Are Insufficient

There is a persistent belief in the security community that if you can detect a problem and audit the evidence, you have adequate governance. For traditional software systems, that assumption had some validity. For agentic AI systems, it breaks down in three fundamental ways.

The Speed Problem

Agentic systems operate at machine speed. An agent can plan, execute, and complete a multi-step workflow in seconds. By the time a detection system flags an anomaly and a human reviews the alert, the action has already been taken, the data has already been accessed, and the operational effect has already occurred. Detection after execution is incident response. It is not governance.

The Scale Problem

Organizations are deploying dozens or hundreds of agents across workflows. Each agent may execute thousands of actions per day. The volume of decisions, tool calls, and data accesses exceeds what any human review process can handle. You cannot govern agentic systems through manual review of execution logs. The scale of the problem demands automated, pre-execution control enforcement.

The Evidence Problem

Even when detection works, the evidence produced by most monitoring systems is insufficient for regulatory or legal purposes. Knowing that an agent made an API call is different from proving that the call was authorized, that it matched the declared intent, that the authority to make it was validly granted, and that the output was within permitted boundaries. Regulators, auditors, and courts will increasingly demand structured evidence of control enforcement, and "we detected the incident after it happened" will not satisfy that standard.

What ACR Is and Why It Matters

The Agentic Controls Reference (ACR) Standard was built to close the enforcement gap. It is a control standard designed specifically for agentic AI systems that defines the mandatory runtime conditions a system must satisfy to be considered governed.

ACR is organized around six control pillars. Each pillar addresses a distinct category of runtime control that must be present, enforceable, and auditable.

1. Human Authority

Defines who can authorize agent actions, when human approval is mandatory, what constitutes a valid override, and how the system must behave when human authority is absent. This is the operational definition of "human in the loop" that most governance frameworks leave undefined.

2. Intent Validation

Requires that every agent action be evaluated against a declared intent contract before execution. If the action does not match the declared purpose, it is blocked. This is how you prevent misalignment, hallucinated actions, and scope creep at the execution boundary.

3. Containment

Enforces runtime boundaries on what an agent can access, invoke, and affect. Containment boundaries are graduated, meaning the system can tighten constraints dynamically based on risk signals. This prevents excessive agency and limits the blast radius of any single failure.

4. Evidence and Audit

Requires structured, tamper-evident records of every decision, action, authority issuance, and containment event. This is the pillar that makes governance provable. It produces the evidence that regulators, auditors, and legal teams will demand.

5. Drift Detection

Monitors for behavioral deviation from declared baselines over time. Drift detection catches the slow-burn failures that point-in-time assessments miss: the agent that gradually shifts behavior, the memory store that accumulates corrupted context, the system that moves outside its original operating parameters without triggering any single threshold.

6. Observability

Mandates end-to-end visibility into agent operations. Observability under ACR is the structured, exportable evidence that the entire control architecture is functioning as designed. It covers decisions made, tools invoked, authority exercised, containment actions triggered, and outcomes produced.

The key distinction is this: ACR does not describe best practices or recommendations. It defines mandatory control conditions. A system either satisfies those conditions at runtime or it does not. That binary evaluation is what separates a control standard from a guidance document.

Practical Steps to Deploy Runtime Control

If you are a CISO or security leader reading this, you are likely asking what you should do right now. Here are the concrete steps to begin closing the enforcement gap in your organization.

Step 1: Inventory Your Agentic Systems

You cannot govern what you have not identified. Build a complete inventory of every AI system in your organization that has the ability to take autonomous actions: invoke tools, access data stores, call APIs, communicate with other agents, or produce operational effects. Many organizations are surprised by the number of agentic systems already running in production when they conduct this inventory.

Step 2: Map the Enforcement Gap

For each agentic system in your inventory, document the gap between your governance policies and the runtime enforcement mechanisms actually in place. Ask specific questions: Does this system enforce authority boundaries at execution time? Is there pre-execution intent validation? Are containment boundaries technically enforced or only documented? Is there structured evidence of every decision and action? This mapping will show you exactly where your exposure is.

Step 3: Classify Risk by Action Class

Use the ACR action classification framework to categorize the actions your agents can take by risk level. Not every agent action requires the same level of control. Reading public data is different from writing to a production database. Sending an internal notification is different from initiating a financial transaction. Risk classification allows you to prioritize where runtime controls are most urgently needed.

Step 4: Implement Authority Boundaries First

Of the six ACR pillars, Human Authority is the most immediately impactful starting point. Define clear authority models for your highest-risk agentic systems. Specify who can authorize what actions, when human approval is required, and what happens when authority is absent. Connect those authority definitions to technical enforcement mechanisms in your control plane.

Step 5: Add Pre-Execution Intent Validation

For your highest-risk agents, implement intent validation at the execution boundary. Every action the agent attempts should be evaluated against its declared purpose before it executes. This single control prevents a significant percentage of the failure modes described earlier in this post.

Step 6: Build the Evidence Layer

Implement structured logging that captures authority grants, intent evaluations, containment decisions, and execution outcomes for every agent action. This is the foundation for audit readiness and regulatory compliance. The evidence layer should be append-only and tamper-evident for high-risk events.

Roadmap for the Board and CISO: Accountability and Next Steps

If you are presenting AI governance to your board, here is the framing that matters.

The Liability Question

Boards care about liability. The enforcement gap creates direct liability exposure. If an AI system takes an action that causes harm and your organization cannot demonstrate that runtime controls were in place and functioning, the liability position is weak. "We had a policy" is not a defense when the system operated outside that policy and there was no mechanism to prevent it. ACR provides the control framework that makes governance demonstrable and defensible.

The Regulatory Trajectory

The EU AI Act, NIST AI RMF, and emerging sector-specific regulations are moving toward requiring demonstrable control over AI systems. The regulatory trajectory is clear: organizations will need to prove that their AI systems operate within defined boundaries and that those boundaries are technically enforced. Getting ahead of this curve is a strategic advantage. Waiting until regulations mandate specific controls means implementing under pressure and timeline constraints.

The 90-Day Action Plan

For CISOs who need to show progress to their boards, here is a realistic 90-day plan:

Days 1 through 30: Discovery and Assessment. Complete the agentic system inventory. Map the enforcement gap for your top 10 highest-risk systems. Present findings to the security leadership team with specific risk quantification.

Days 31 through 60: Architecture and Prioritization. Classify agent actions by risk level using the ACR framework. Design authority models for your three highest-risk agentic systems. Identify the technical integration points where runtime controls will be enforced.

Days 61 through 90: Initial Implementation. Deploy authority boundaries and pre-execution intent validation for your highest-risk system. Implement structured evidence logging. Establish baseline drift detection. Present the initial control architecture and evidence to the board as a proof of concept.

The Long-Term Position

Organizations that implement runtime control for their agentic AI systems now will be in a fundamentally stronger position than those that wait. They will have enforceable governance that reduces liability. They will have structured evidence that satisfies regulators and auditors. They will have operational visibility that supports incident response and continuous improvement. And they will have a control architecture that scales as their use of agentic AI grows.

The enforcement gap is real, it is growing, and it will be the defining challenge of AI governance for the next several years. The question for CISOs is whether their organizations will close that gap proactively or be forced to close it after an incident makes the cost of inaction clear.

The ACR Standard provides the control framework. The practical steps outlined here provide the starting point. The rest is execution.

Want more governance insights?