Unlike a simple chatbot, an agent retrieves documents, calls tools, remembers conversations, and hands work to other agents — and every one of those abilities is also an attack surface.
The six agent surfaces
- Prompt — direct injection and jailbreaks that turn input into commands.
- Retrieval (RAG) — a poisoned document becomes a trusted instruction (see RAG poisoning).
- Tool / MCP calls — malicious or toxic-combination tool use (see MCP security).
- Session / memory — multi-turn manipulation and memory poisoning that persists across sessions.
- Agent-to-agent — unverified trust and lost lineage across hand-offs.
- Response / egress — PII leakage and data exfiltration in the output.
The core insight: to a model, every channel is an instruction. A document, a tool result, or a line of memory can all read as a command — so inspecting only the user's prompt protects a fraction of the agent.
Why it's a distinct discipline
Traditional application security assumes deterministic, slow-changing systems. Agents are non-deterministic and change behavior when their prompt, tools, or context change — often without a deploy. Agentic AI security therefore emphasizes runtime inspection across all surfaces, adaptive controls, and continuous monitoring rather than one-time review.
How it's done
Effective agentic AI security combines layered, defense-in-depth inspection on every surface, least-privilege tool access with egress allowlists, data-loss prevention (PII tokenization), a tamper-evident audit trail, and benchmarking against recognized taxonomies like the OWASP LLM Top 10, MITRE ATLAS, and NIST AI RMF. Running it self-hosted keeps the agent's data inside your own perimeter. This is the model TrustGate is built on — see the agentic attack surface.
FAQ
How is agentic AI security different from LLM security? LLM security often focuses on the prompt and response of a single model. Agentic AI security covers the additional surfaces an autonomous agent uses — retrieval, tools/MCP, session memory, and agent-to-agent — where most real attacks now occur.
What frameworks apply? The OWASP LLM Top 10, MITRE ATLAS, and the NIST AI Risk Management Framework are the common reference taxonomies for agent threats.