Large language models (LLMs) are migrating from drafting tickets and summarizing alerts to directly querying telemetry, proposing configuration changes, and even executing those changes in live production environments. This shift, often marketed as autonomous remediation or self-healing infrastructure, opens a Pandora's box of security vulnerabilities. A recent survey on agentic AI in network and IT operations frames this evolution as a classic confused-deputy problem waiting to happen.
The confused-deputy attack is well known in cybersecurity: a trusted program with elevated privileges is tricked by an attacker into misusing those privileges. Agentic operations create an ideal substrate for this abuse. The LLM agent holds legitimate access to change-management APIs, deployment pipelines, network controllers, and incident response systems. Its decisions are shaped by the data it ingests: tickets, runbooks, chat transcripts, log entries, and other artifacts that are exactly the same inputs an attacker can influence. Compromising the AI model itself becomes unnecessary when an attacker can simply compromise the text the agent reads before it uses its tools.
The Confused-Deputy Problem in Agentic AI Security
The classic confused-deputy attack was first described in the context of operating systems and compilers, where a well-intentioned program could be exploited to perform unauthorized actions. In the modern context, an LLM operating as a deputy for human operators inherits all the privileges of its role. However, because the LLM's understanding of the world is derived from natural language instructions and data that can be tampered with, the deputy can be confused by adversarial inputs. This is not a hypothetical scenario; research has shown that LLMs can be manipulated via prompt injection, where hidden instructions in a seemingly benign document cause the model to deviate from its intended behavior. When that document is a Jira ticket describing a server issue, the consequences can be catastrophic.
Agentic AI systems are designed to act autonomously, but autonomy without robust security boundaries is a recipe for disaster. The very features that make these systems powerful are also their Achilles' heel. For example, an agent with the ability to read incident history and propose fixes can be poisoned by a malicious actor who deliberately inserts misleading information into the knowledge base. A system that trusts its retrieved context without validation becomes a vector for attack.
Four Attack Categories Targeting LLM Operations
The survey catalogs several attack categories that deserve far more attention than they currently receive. The most familiar is prompt injection through operational artifacts: an attacker embeds malicious instructions in a ticket, a wiki page, or a runbook that steers the agent toward unsafe actions. In a high-pressure incident response scenario, the agent might read a compromised ticket and initiate a series of dangerous commands before a human can intervene.
Subtler variants exist. Retrieval poisoning corrupts the runbooks and incident histories the agent consults, biasing its diagnoses toward attacker-chosen conclusions. For instance, an attacker could modify a runbook to suggest that a certain server crash must be fixed by disabling a security service, rather than patching a vulnerability. The agent, trusting its retrieved knowledge, executes the disabling action.
Retrieval jamming works in the opposite direction. Here, the knowledge base is flooded with blocker documents that trigger refusal loops, effectively stalling incident response when it is most needed. If every query returns a document that says "this action requires manual review" or "this operation is forbidden," the agent freezes, unable to proceed with legitimate remediation. During a real outage, such delays can cause cascading failures.
Telemetry manipulation targets the metrics and logs that LLM-driven operations agents use to diagnose problems. An attacker who can influence what telemetry data says can steer the agent's mitigation decisions in a desired direction. For example, by faking a sensor reading that indicates a server is overheating, the attacker could trick the agent into triggering a shutdown that takes down a critical service. These attacks are particularly insidious because they do not look like attacks. They look like normal incident response that happens to go wrong.
The Propose-Commit Split as an Architectural Defense
The primary defense proposed by the survey is architectural rather than behavioral. The authors argue for a strict propose-commit split: the language model can reason, retrieve evidence, and draft change proposals, but it cannot execute writes. Every action that touches production must pass through a non-bypassable gate that the model has no authority over. This gate enforces policy-as-code checks, invariant verification, human approval for high-blast-radius changes, and rollback-ready staged deployment.
In this architecture, the model's job is to draft a diff. The gate's job is to decide whether that diff can be applied. The separation ensures that even if the LLM is compromised via prompt injection, the damage is limited to proposals that are still subject to verification. The gate can be implemented as a separate service that runs deterministic checks, independent of the LLM's reasoning. Audit logs that are integrity-protected, enabling post-incident forensics to reconstruct exactly what happened, round out the control set.
This split is analogous to the principle of least privilege applied to agentic systems. The LLM is granted only the privileges necessary to read and propose, not to write or execute. By decoupling the proposal step from the commit step, organizations can maintain a clean separation of duties that prevents a single compromised component from causing widespread harm.
The Limits of Prompt-Based Agentic AI Security
This architecture matters because prompt-only defenses are inherently brittle. A system where the model's text generation can directly cause production changes has built its security perimeter inside the most unpredictable component in the stack: the LLM itself. LLMs are notoriously vulnerable to adversarial inputs, and relying on system prompts to instruct the model to "be safe" is insufficient. The OWASP excessive-agency pattern describes exactly this failure mode: granting an AI agent too much autonomy without proper access controls. The survey notes that in practice, this is often a failure to implement the propose-commit split cleanly.
Many current deployments skip the gate entirely, relying on the model's own judgment to avoid dangerous actions. This is akin to asking a bank teller to decide not to hand over money to a robber without a security guard, rather than installing a locked door. The AI industry has spent decades learning that rule-based systems are fragile; yet we are now replicating that fragility in LLM-based operations by trusting the model to self-regulate.
The Missing Evidence for Safe LLM Autonomy
A measurement problem sits alongside the architectural one. Many claims about safe agentic operations cannot be falsified because the supporting evidence is missing. The survey identifies what evaluations should report: tool-call traces, gate-violation rates, behavior under adversarial inputs, refusal-storm rates under jamming attacks, and rollback completeness. Most current benchmarks omit these crucial metrics. A system that performs well on clean incidents may collapse the moment someone embeds a hostile instruction in a Jira ticket.
Security teams evaluating agentic products should ask for adversarial evaluation data alongside success metrics on benign workloads. Without such data, claims of safety are little more than marketing. The industry needs standardized adversarial benchmarks that test the full range of attack categories, including prompt injection, retrieval poisoning, and telemetry manipulation.
Furthermore, the concept of "self-healing infrastructure" is often oversold. Many vendor demos show the system handling simple, predictable failures like restarts or scaling adjustments. But real-world production environments are complex, with interdependent services, legacy systems, and human-driven processes. An agent that can heal a linear fault may fail catastrophically in a non-linear cascading failure, especially under adversarial pressure. The lack of public evidence for robustness in such scenarios is a red flag.
Where Autonomy Earns Trust and Where It Does Not
The amount of autonomy an agent has is directly proportional to the amount of damage it can do when things go sideways. Read-only assistance is useful and low-risk: an AI that can answer questions, retrieve data, and summarize logs without making changes is a powerful tool that does not introduce new vulnerabilities. Bounded execution with strong gates is defensible: the agent can propose changes, but they are checked by policy-as-code and require human approval for high-risk actions. Open-ended self-healing across large production environments, without the verification scaffolding the survey describes, is a much harder problem than current deployments make it sound. Claims of fully autonomous remediation deserve healthy skepticism.
The survey is a wake-up call for both security professionals and IT operations teams. As LLMs are granted more and more access to critical systems, the attack surface expands in novel ways. The confused-deputy problem is not merely a theoretical curiosity; it is a practical risk that must be addressed through architectural controls, not just prompt engineering. Organizations should demand adversarial evidence before deploying agentic AI in production, and they should implement the propose-commit split to ensure that even if the AI goes rogue, there is a deterministic gate to stop it.
The era of agentic AI operations is just beginning, and the industry's rush to market must not outpace its ability to secure these systems. The lessons from the confused-deputy problem are clear: privilege must be separated, gates must be non-bypassable, and trust must be earned through rigorous measurement. Anything less is an invitation to disaster.
Source: Help Net Security News