Skip to content
Search

Latest Stories

TCB Conference Banner
cipherbrief

Welcome! Log in to stay connected and make the most of your experience.

Input clean

The Limits of Human Oversight at Machine Speed

OPINION — Warfare has always operated at human speed, but we now have the capability to operate at machine speed. The risks are high, but so are the risks of failing to adapt. Our adversaries are moving toward machine speed faster than we are, and the gap is widening faster than our processes can evolve.

Many companies are developing AI tools that accelerate the decision cycle and shrink OODA (Observe, Orient, Decide, Act) loops, augmenting analysts so they can triage alerts, draft courses of action, and surface recommendations in a fraction of the time it used to take. The tools are good and getting better, and the companies building them are doing important work.


But there is a ceiling. So long as a human sits at the “decide” step, the cycle runs at human speed. Augmented human speed, but human speed nonetheless. The AI can compress the observe and orient steps to near-zero, but it cannot compress the human decision process. The human is, in this configuration, the limitation.

That limitation is not inherently a problem. For most of the decisions we care about, we want a human making them. Across most of the defense enterprise, in planning, intelligence analysis, logistics, personnel, and countless workflows where judgment, accountability, and context matter, humans add real value. The argument that follows is not a blanket case for autonomy. It is about a specific class of decisions, in a specific class of operational environments, where the speed differential between offense and defense is becoming the determining factor.

The problem is that our adversaries may not accept the same ceiling. If they are willing to close the loop entirely, letting the machine observe, orient, decide, and act without a human gate, then their cycle runs at machine speed and ours runs at augmented-human speed. Those are not comparable tempos. Orders of magnitude separate them, and the gap is growing.

This is the context for every conversation about keeping humans in the loop. In a contest where one side operates at machine speed and the other does not, a human review step can be both a safeguard and a structural disadvantage. The question is no longer whether we can afford to keep humans in the loop. The question is whether the humans we claim to have in the loop are actually doing anything, and whether their presence reflects meaningful oversight or has quietly become a fiction we maintain because the alternative is uncomfortable.

This is a hard conversation, and hardest on the kinetic side, where autonomous lethal decisions raise questions we are not ready to answer. It is more tractable in cyber. Not because the stakes are zero, but because cyber effects do not place lives directly at stake on the same scale as kinetic strikes. The competitive pressure is already forcing decisions in cyber that the kinetic debate has been able to defer. That is where this piece starts.

The Cyber Case

In cyber, the argument for accelerating decision cycles isn't philosophical. It's arithmetic.

The Zero Day Clock, an industry tracker maintained by a coalition of cybersecurity researchers, measures when the mean time from vulnerability disclosure to first observed exploit crosses key thresholds. The one-year threshold was reached around 2021. One month in 2025. One week and one day were both crossed in 2026. One hour is projected for later this year. One minute by 2028.

The interval between milestones is collapsing. It took roughly four years to go from year-scale to month-scale exploitation, one year to go from month to week, and week to day happened in the same calendar year. Defenders who designed their patch cycles around the assumption of months are now operating against adversaries who weaponize disclosed vulnerabilities in hours.

Cyber operators today use AI tools to work through alerts and incidents faster, and those tools genuinely help. For routine work, the current model of AI surfacing and human deciding is fine. But for a contested environment against a capable adversary moving at the speeds the data describes, the math becomes harder to defend.

Tools that scan codebases for vulnerabilities are not new. What is new is the next step: these tools are starting to generate patches and mitigations for the vulnerabilities they find. The AI identifies the problem, proposes a fix, and routes the recommendation to a human for review before implementation. That review takes time. Not much by human standards, but enormous by the standards of what is happening on the other side.

Anthropic's Mythos preview is one indication of where this is headed. According to Anthropic's published descriptions, Mythos can find zero-day vulnerabilities and exploit them with minimal or no human input, closing the entire kill chain across the MITRE ATT&CK matrix. It is not alone. Google's Big Sleep was reported in late 2024 to have found the first publicly disclosed AI-discovered zero-day in SQLite, found by an AI before any human defender. Anthropic's red team reported in early 2026 that Claude had identified over 500 high-severity vulnerabilities in widely used open-source software, many of which had survived decades of expert review.

As Sean Heelan put it: the limiting factor on a capable state's ability to generate exploits is no longer the number of skilled hackers it can recruit. It is token consumption.

Bruce Schneier, Heather Adkins, and Gadi Evron published a joint essay in 2025 warning that we are approaching a singularity moment for cyber attackers, the point at which AI systems can discover vulnerabilities, write exploits, and launch attacks faster than any human defender can respond. The attackers' AI singularity is well underway; the defenders' is significantly behind. Reasonable people can disagree about how far behind. Few disagree about the direction.

The crucial point is this: just a few years ago, having a human in the loop wasn't really a choice. The technology wasn't capable enough to close the kill chain. AI tools could surface candidates, but the actual decision-making and execution was done by humans because nothing else could. That is no longer true. The technology can now close the chain end-to-end, and in some narrow tasks it can do so better than the humans it is supplementing. Whether to let it is a real question now, not a technical limitation pretending to be a policy choice.

If an adversary's AI can identify a vulnerability and weaponize it in minutes while our response workflow routes the patch recommendation through a human for review, we are not in the same race. The human review step that felt prudent in 2020 is, in some operational contexts, the step that ensures we lose.

This is the easier version of the conversation. The capabilities are concrete, the failure mode is a compromised network rather than a destroyed building, and the competitive pressure is undeniable. And yet even in cyber, we are struggling to have it honestly. Some of that is appropriate caution; some is risk aversion; some is the difficulty of holding AI capability providers accountable in a field evolving faster than the frameworks for evaluating it.

The Kinetic Case

The kinetic version of this conversation is harder because the stakes are final and the cultural resistance is more deeply entrenched.

For most of the history of weapons, humans were the end operators. Small arms, artillery, and dumb bombs all relied on a human for aiming and firing. Laser-guided munitions shifted some of the guidance burden to the technology, but a JTAC on the ground still had to mark the target. GPS-guided munitions moved further; the operator inputs coordinates and the weapon does the rest, but humans still chose what to target. Through every generation, the kill chain was executed by humans because nothing else could.

We are now fielding systems that can handle targeting, firing, guidance, and delivery of effects without a human at any of those steps. The technology has caught up; in some narrow tasks, it has surpassed us. The cultural framing has not. We still talk about autonomous weapons as though the question is whether to cross a line. The line has been moving for forty years, and we have been crossing it incrementally the whole time. What is new is that the technology is now capable of completing the trajectory.

That does not mean we should rush to full autonomy in lethal decisions. It means the conversation we need to have is not "should we ever remove humans from the loop" but "at what point have we effectively done so already, and are we being honest about it?"

What Is the Human Actually Doing?

This is the question the rest of the debate hinges on.

When we say there is a human in the loop, what is the human actually doing? Are they independently verifying or re-doing the AI system's work? If so, it defeats much of the purpose of using the AI. If not, it defeats much, if not all, of the purpose of having the human there. If the answer depends on the situation (which it almost always will), how are we deciding which situations justify fully autonomous action?

These questions have real answers in some contexts. There are workflows where a human reviewer genuinely catches errors the AI missed, including obvious ones the AI is structurally bad at recognizing. This is the most critical reason today, but the errors are becoming fewer and farther between. Human verification can also serve a second purpose: providing the feedback signal that helps train and improve the model. In those contexts, the human in the loop is doing real work, and the right policy is to keep them there. The argument here is not that human oversight is always theater. It is that we need to be honest about which contexts it is and which it isn't.

Consider AI-generated targeting. During an operation, an AI system ingests real-time intelligence feeds (signals, imagery, pattern-of-life data, network traffic) and produces a list of targets. A human is assigned to review the list before strikes are authorized. What does that review actually consist of?

The human does not have time to review all of the intelligence data the AI processed, and could not do it at the speed of the operation even if they had the analytical capacity. What they can do is a sanity check. They can ask whether the targets look roughly like the kind of targets they expect to see and flag obvious errors, the kind that come from the AI getting confused in ways a human would not. That catch is genuinely valuable. They can also provide a feedback signal that, over time, makes the system better. What they cannot do is verify that the AI's reasoning was correct. When speed matters, that limitation becomes a liability.

Reports of the Israeli military's use of the Lavender system during operations in Gaza illustrate what happens when this dynamic meets operational pressure. According to reporting by +972 Magazine and Local Call, lower-level operators faced extreme pressure to strike targets at a high pace and leaned on Lavender to generate target lists they could not meaningfully verify at the tempo demanded. Human review existed in name. In practice, the operators were approving AI-generated decisions they did not have the bandwidth to assess. What they were doing was signing off.

A non-AI parallel sharpens the point. Microsoft's "Digital Escort" program, reported by ProPublica in 2025, was designed to comply with Pentagon restrictions on foreign nationals accessing sensitive systems. Microsoft used lower-cost engineers in China to maintain government cloud systems and hired U.S.-based "digital escorts" to formally implement the code changes on the engineers' behalf. The escorts were less technically skilled than the engineers whose work they were approving and often did not understand what they were implementing. In practice, they rubber-stamped the work. The ‘American in the loop’ was theater.

This is the pattern we should expect with AI systems operating at the edge of human capacity. If the AI is doing work the human could not do themselves, or at a speed they cannot match, the human's role collapses from verification to approval, and under operational pressure, to rubber-stamping. The loop is closed in name only.

When human oversight collapses to rubber-stamping, we end up with the worst of both options. We have slowed the system down, accepting the operational disadvantage of human-speed decision cycles, without preserving the safety benefit that human review was supposed to provide. The risk is still present; we have simply added latency. It is a self-imposed disadvantage with none of the benefits that justified it.

In some current deployments, we already have this dynamic and we are not acknowledging it. The human in the loop comforts us. It satisfies the policy requirement and provides someone to name as the accountable decision-maker after the fact. It does not meaningfully alter what the AI would have done on its own.

Accountability When the Human Can't Keep Up

The accountability question follows directly from the verification question, and it breaks a chain we have relied on for a century.

When a rifle round hits the wrong target, we do not blame the rifle manufacturer; we investigate the shooter. When a dumb bomb misses, we investigate the pilot and the targeting process. When a laser-guided bomb hits the wrong building, we investigate the JTAC, the target designation, and the command chain. When a GPS-guided munition hits a school, we investigate whether the coordinates were correct and whether the targeting cell followed proper procedure. Through every generation, accountability has run to the human operator or the humans in the decision chain above them.

This works because the human operator is meaningfully in control. They choose the target, input the data, pull the trigger. They have both the authority and the capacity to be responsible for the outcome.

Autonomous systems strain this chain. If the human in the loop is functionally rubber-stamping AI-generated decisions made at speeds and against data volumes they cannot independently evaluate, it is not coherent to hold them solely responsible. We can name them as accountable in an after-action review. We cannot credibly claim they were the decision-maker.

This shifts accountability upstream. If the human at the edge cannot meaningfully verify the decision, then responsibility lies more heavily with the people who decided what the system would be allowed to do: the developers, the testers, the commanders who set the authorities, the policymakers who approved the capability for deployment. The operator at the terminal is executing a decision that has, in important respects, already been made.

Developing autonomous control layers and targeting systems is not like developing a rifle. A rifle manufacturer ships a tool and trusts the operator to use it responsibly. An AI targeting system manufacturer is shipping something closer to a decision-maker, a system that will, in practice if not in policy, determine outcomes that human operators cannot meaningfully override. That shift in function requires a shift in how we think about responsibility. The builder does not get to hand off the system and walk away.

This is not an argument against building these capabilities. The companies and labs developing autonomous defense systems are doing essential work, and the United States and its allies need them to keep doing it. It is an argument for building them with full awareness of what is being built and how it is being used. These labs are not just providing tools. They are making strategic and ethical decisions that will shape how force is used. The more honest we are about this, the better the systems will be.

Trust, and the Honest Conversation

We arrive at a gap that defines the current moment. We cannot keep humans meaningfully in the loop at machine speed in every context. We do not yet trust the systems enough to take them out. Both propositions are true.

The temptation is to resolve the gap by picking one side: full autonomy in the name of competitive necessity, or full human control in the name of moral responsibility. Neither is serious. Full autonomy without adequate trust risks catastrophic errors we cannot unwind. Full human control against an adversary at machine speed guarantees we lose before we can control anything.

So why are we struggling to have this conversation honestly? Several reasons, none unreasonable on their own. Senior decision-makers do not yet have the basis to trust autonomous systems with consequential decisions, because the evidence base hasn't been built. Risk aversion in defense institutions is a feature, not a bug; it has prevented many bad outcomes, even if it now imposes costs. We don't have mature frameworks for holding AI capability providers accountable. An autonomous lethal force, even when bounded and tested, raises moral questions that the Department is right to take seriously.

None of this is a reason to avoid the conversation but it is a reason to have it more carefully. That requires building the evidence base for trust. Trust is the product of testing, adversarial red-teaming, operational evaluation under realistic conditions, and accumulated evidence that the system behaves as intended across the range of situations it will face. We do not have this evidence for most of the autonomous capabilities being fielded or contemplated. Building it is not optional, and it cannot be skipped because the adversary is moving fast.

It also requires being honest about which loops have humans in name only. If the human reviewer cannot meaningfully verify the AI's decision, claiming they are in the loop is a fiction. The right response is to either make the human's role genuine, by slowing the system or narrowing its scope so review is possible, or to acknowledge that the decision is effectively autonomous and design the controls and accountability structures accordingly.

And it requires distinguishing between cases. Autonomous patching of a vulnerability in an isolated system is a different decision than autonomous targeting for lethal strikes. We need frameworks that distinguish between reversible and irreversible actions, between contained and uncontained effects, between narrow and broad consequences. A blanket "human in the loop" policy treats all these cases as identical. They are not.

The decision about whether to remove humans from certain loops has, in some narrow domains, already been made by the math. Our choice is whether to acknowledge that and build the systems and accountability structures that make it responsible, or to maintain a comforting fiction until something forces a reckoning we are not prepared for.

The adversaries are not waiting for us to decide.

The Cipher Brief is committed to publishing a range of perspectives on national security issues submitted by deeply experienced national security professionals. Opinions expressed are those of the author and do not represent the views or opinions of The Cipher Brief.

Have a perspective to share based on your experience in the national security field? Send it to Editor@thecipherbrief.com for publication consideration.

Read more expert-driven national security insights, perspective and analysis in The Cipher Brief

Related Articles

RUSSIA-CHINA-DIPLOMACY

The Bear and the Dragon: The Threat of Sino-Russian Opportunism and Intelligence Miscalculation

KREMLIN FILES/COLUMN: As Washington's attention continues to be diverted with an Iran unwilling to come to a comprehensive peace, a more dangerous [...] More

Confidence, Interoperability, and the Limits of U.S. Decision Systems

OPINION -- In recent months, U.S. policy debates have increasingly acknowledged that the decisive contests of the 21st century will not be fought [...] More

The Intelligence Community’s Acquisition Revolution: Can Washington Move Fast Enough?

OPINION -- On February 9, the CIA announced a major overhaul of its technology acquisition from the private sector. Director John Ratcliffe described [...] More

​Armed supporters of ousted Venezuelan President Nicolas Maduro stand guard

The Dangerous Trade of State Secrets

At just after 2 a.m. on January 3, explosions echoed across Caracas. Low-flying aircraft struck military installations. Venezuelan President Nicolás [...] More

The Costly Illusion of the Golden Dome

“The Golden Dome for America strategy [President Trump’s proposed nationwide anti-missile defense system] remains centered on affordable and scalable [...] More

Trump, Iran, and the Stress Test of Western Alliances

The war with Iran has grown beyond just a regional war; it is also a preliminary test of the cohesion of Western alliances under President Donald [...] More

{{}}