Insights · Engineering note

Securing an agentic solution

An agent is software that chooses its own actions. That single property breaks the assumptions traditional application security rests on. Here is how to build the boundaries back in.

An action must clear every boundary before it reaches the world. You build them from the centre out.

A classic web app does what its code says. An agent does what its reasoning decides, and its reasoning is shaped by whatever text lands in its context, including text you did not write and cannot vet.

The failure mode is concrete. An orchestrator enters a reasoning loop, calls a model repeatedly, runs up a serious inference bill, and persists nothing usable because the run never reaches a committed state. No attacker required. Add an attacker, and the same loop becomes a way to delete records, exfiltrate data, or send mail on your behalf.

This guide treats security as a design constraint rather than a launch gate. The structure follows four boundaries you build in from the start, mapped to the OWASP Top 10 for Agentic Applications where shared vocabulary helps.

00Start from a threat model, not a checklist

OWASP published a dedicated Top 10 for Agentic Applications in December 2025, separate from the existing LLM Top 10. Read both. The agentic list names the risks that only appear once a model can plan, remember, call tools, and hand work to other agents: goal hijack, tool misuse, identity and privilege abuse, unexpected code execution, memory and context poisoning, insecure inter-agent communication, cascading failures, and human-agent trust exploitation. The LLM Top 10 still applies underneath it: prompt injection, improper output handling, excessive agency, sensitive information disclosure.

The distinction that matters for design is simple. A chatbot that gets manipulated says something wrong. An agent that gets manipulated does something wrong. Your controls have to constrain actions, not just outputs.

B1Treat every model output as untrusted

This is the point most guides get wrong. You cannot sanitise your way out of indirect prompt injection. Stripping suspicious strings from scraped pages or emails fails because instructions can be rephrased, encoded, or hidden where your filter does not look, and because the model has no reliable way to tell data from instruction. OWASP lists prompt injection as the top LLM risk precisely because there is no complete fix at the model layer.

The defence is structural. Assume the model can be subverted, and design so that subversion has limited reach.

On tool input and output: never pipe raw model text into a tool. Have the model emit structured output, using constrained or grammar-guided decoding into JSON validated against a schema, and check it before anything executes. This is improper output handling in OWASP terms, and it is a parsing and validation problem, not a formatting preference. Serialise to whatever wire format your tools expect after validation, not as the thing the model produces.

The honest position, stated plainly in the research: while agents and their defences both rely on today’s class of models, no general-purpose agent offers reliable safety guarantees. So you narrow what the agent is allowed to do. That is the whole game.

B2Least privilege for tools

Every tool an agent holds is a capability an attacker inherits if they take the wheel.

B3Runtime boundaries

Assume the agent will at some point try to do something it should not, whether through injection, a model error, or a loop.

B4Human oversight and governance

Autonomy is a dial, not a switch. Turn it down for actions you cannot undo.

+A note on multi-agent systems

Every boundary above gets harder when agents talk to each other. An inter-agent message is another untrusted channel, so a manipulated agent can poison the ones downstream, and failures cascade across the system rather than staying local. If you are building multi-agent, treat one agent’s output to another with the same suspicion as scraped web content, and scope each agent’s tools and identity separately so one compromise does not become all of them.

The one line to design around

You cannot make a general-purpose agent on today’s models provably safe. You make a narrow one safe by constraining what it can do, validating everything it emits before acting on it, and gating the actions you cannot take back.

Build those constraints in from the first commit. They are far harder to retrofit than to design in, and the incident that proves the point is always more expensive than the control that would have prevented it.

Further reading

  1. OWASP Gen AI Security Project, Top 10 for Agentic Applications (2026).
  2. OWASP Gen AI Security Project, Top 10 for LLM Applications (2025).
  3. Beurer-Kellner et al., Design Patterns for Securing LLM Agents against Prompt Injections (2025).
  4. Google DeepMind, Defeating Prompt Injections by Design (CaMeL, 2025).
  5. Willison, The Dual LLM pattern for building AI assistants that can resist prompt injection (2023).

Gordion Solutions provides fractional CTO and engineering for agentic AI, capital markets infrastructure, and safety-critical systems. Get in touch.