Insights · Engineering note

Securing an agentic solution

An agent is software that chooses its own actions. That single property breaks the assumptions traditional application security rests on. Here is how to build the boundaries back in.

By Antony Coppellotti, founder and fractional CTO, Gordion Solutions

B4 · Oversight

B3 · Runtime

B2 · Tools

B1 · Reasoning

agent

An action must clear every boundary before it reaches the world. You build them from the centre out.

A classic web app does what its code says. An agent does what its reasoning decides, and its reasoning is shaped by whatever text lands in its context, including text you did not write and cannot vet.

The failure mode is concrete. An orchestrator enters a reasoning loop, calls a model repeatedly, runs up a serious inference bill, and persists nothing usable because the run never reaches a committed state. No attacker required. Add an attacker, and the same loop becomes a way to delete records, exfiltrate data, or send mail on your behalf.

This guide treats security as a design constraint rather than a launch gate. The structure follows four boundaries you build in from the start, mapped to the OWASP Top 10 for Agentic Applications where shared vocabulary helps.

00Start from a threat model, not a checklist

OWASP published a dedicated Top 10 for Agentic Applications in December 2025, separate from the existing LLM Top 10. Read both. The agentic list names the risks that only appear once a model can plan, remember, call tools, and hand work to other agents: goal hijack, tool misuse, identity and privilege abuse, unexpected code execution, memory and context poisoning, insecure inter-agent communication, cascading failures, and human-agent trust exploitation. The LLM Top 10 still applies underneath it: prompt injection, improper output handling, excessive agency, sensitive information disclosure.

The distinction that matters for design is simple. A chatbot that gets manipulated says something wrong. An agent that gets manipulated does something wrong. Your controls have to constrain actions, not just outputs.

B1Treat every model output as untrusted

This is the point most guides get wrong. You cannot sanitise your way out of indirect prompt injection. Stripping suspicious strings from scraped pages or emails fails because instructions can be rephrased, encoded, or hidden where your filter does not look, and because the model has no reliable way to tell data from instruction. OWASP lists prompt injection as the top LLM risk precisely because there is no complete fix at the model layer.

The defence is structural. Assume the model can be subverted, and design so that subversion has limited reach.

Separate trusted instructions from untrusted data. Keep your operating rules in the developer or system role and wrap third-party content in clearly delimited structure. This raises the bar but does not close the gap on its own, so treat it as a first layer, not the defence.
Put a privilege boundary between planning and data handling. The dual LLM pattern runs a privileged model that plans and holds tool access alongside a quarantined model that only processes untrusted content and can take no actions. The quarantined model returns values, not instructions, to the privileged one.
Constrain the control flow. The 2025 paper on design patterns for securing LLM agents sets out six worth knowing by name: action-selector, plan-then-execute, LLM map-reduce, dual LLM, code-then-execute, and context-minimisation. Google DeepMind’s CaMeL is a code-then-execute instance that treats tool access like an operating system reference monitor, tracking data provenance so tainted inputs cannot reach sensitive actions.
Pick the pattern by how much autonomy the task genuinely needs. These patterns trade utility for security. An action-selector that never feeds tool output back into the model is close to injection-proof and also close to a fixed workflow. A free-roaming agent is flexible and hard to secure. Choose deliberately rather than defaulting to maximum autonomy.

On tool input and output: never pipe raw model text into a tool. Have the model emit structured output, using constrained or grammar-guided decoding into JSON validated against a schema, and check it before anything executes. This is improper output handling in OWASP terms, and it is a parsing and validation problem, not a formatting preference. Serialise to whatever wire format your tools expect after validation, not as the thing the model produces.

The honest position, stated plainly in the research: while agents and their defences both rely on today’s class of models, no general-purpose agent offers reliable safety guarantees. So you narrow what the agent is allowed to do. That is the whole game.

B2Least privilege for tools

Every tool an agent holds is a capability an attacker inherits if they take the wheel.

Scope credentials to the task. An agent that reads calendar invites gets calendar read, nothing else. No delete, no admin, no adjacent scopes granted for convenience. OWASP calls the overshoot excessive agency, and it is the most common way a minor injection becomes a serious incident.
Default tools to read-only. Discovery and retrieval tools should not be able to mutate state. Writes are a separate, explicitly granted capability with their own checks.
Give each agent its own identity. Per-agent credentials let you scope, log, and revoke independently, and they contain the blast radius when one agent is compromised. This is the identity and privilege abuse risk in the agentic list.
Separate isolation from statelessness. A stateless tool that carries no session data between runs is good hygiene, but it does not by itself stop cross-tenant leakage. Tenant isolation is an authorisation and scoping problem: enforce it at the data access layer, not by hoping the tool forgot.

B3Runtime boundaries

Assume the agent will at some point try to do something it should not, whether through injection, a model error, or a loop.

Run actions in ephemeral, isolated sandboxes. Code generation and execution belong in short-lived containers with no standing access to internal networks. Unexpected code execution is its own agentic risk category for a reason.
Allowlist egress rather than blocking it wholesale. Most real tasks need some outbound access, so blanket blocking is unrealistic and blanket allowing is how data leaves. Route outbound calls through a broker that permits only known destinations. This is your main control against exfiltration to attacker infrastructure.
Cap budgets and loops. Set hard limits on tokens, API calls, tool invocations, and iterations per run. This is the direct fix for the runaway loop in the opening, and it doubles as a denial-of-service control. Pair it with transactional execution so a run either reaches a committed state or rolls back cleanly, rather than burning spend and persisting nothing.

B4Human oversight and governance

Autonomy is a dial, not a switch. Turn it down for actions you cannot undo.

Gate irreversible actions on explicit human approval. Sending external email, moving money, deleting data, merging to a protected branch. The approval prompt should show the human what will happen in terms they can judge, not a hash of a tool call.
Log the whole trace, immutably. Every reasoning step, the exact prompts, tool inputs and outputs, and final actions, written to append-only storage. You need this for incident forensics and, increasingly, for audit. Memory and context poisoning attacks are often only visible in the trace.
Be honest about drift detection. Catching circular or runaway behaviour is tractable with loop counters, budget caps, and repetition checks. Detecting that an agent’s plan has quietly drifted from its true objective is largely unsolved on current models. Build and claim the first, do not oversell the second. And where an agent asks a human to trust its judgement, remember that human-agent trust exploitation is a named risk: the person should be able to verify independently, not just take the agent’s word.

+A note on multi-agent systems

Every boundary above gets harder when agents talk to each other. An inter-agent message is another untrusted channel, so a manipulated agent can poison the ones downstream, and failures cascade across the system rather than staying local. If you are building multi-agent, treat one agent’s output to another with the same suspicion as scraped web content, and scope each agent’s tools and identity separately so one compromise does not become all of them.

The one line to design around

You cannot make a general-purpose agent on today’s models provably safe. You make a narrow one safe by constraining what it can do, validating everything it emits before acting on it, and gating the actions you cannot take back.

Build those constraints in from the first commit. They are far harder to retrofit than to design in, and the incident that proves the point is always more expensive than the control that would have prevented it.