Glossary · Agent security

What is RAG poisoning?

RAG poisoning is an attack that plants malicious content in a knowledge source so that when a retrieval-augmented generation (RAG) system retrieves it, the AI model treats the attacker's text as trusted context — a form of indirect prompt injection.

Last updated Jun 21, 2026

It is the most impactful form of indirect prompt injection: the user types nothing malicious, but the model is compromised by a document it pulled in while answering.

How it works

  1. An attacker gets malicious text into a source the RAG system can retrieve — an uploaded document, a wiki page, a product review, a web page, or a vector store entry.
  2. A user asks a normal question.
  3. The retriever pulls the poisoned content as "relevant context."
  4. The model reads the attacker's instructions as part of its trusted context and acts on them — leaking data, producing biased output, or (for an agent) misusing a tool.

A related variant is memory poisoning, where the planted "fact" is written into long-term memory and persists across sessions, so the attack keeps working long after the attacker is gone.

Why it's dangerous

RAG poisoning is trigger-free and persistent: once the poisoned content sits in the corpus, it can affect many users and many sessions, and it requires no ongoing access. Because the malicious instruction arrives through a trusted internal pipeline, traditional input filtering on the user's prompt never sees it.

How to defend against RAG poisoning

  • Inspect retrieved context before it reaches the model — treat documents the agent pulls in as untrusted input, the same way you'd treat a user prompt.
  • Control what can enter the corpus — validate and attribute sources; apply access controls to the knowledge base.
  • Monitor for anomalies in retrieved content and model behavior.
  • Keep retrieval auditable so a poisoned source can be traced and removed.

TrustGate inspects the retrieval surface (ingress-rag) so a poisoned document can't quietly become a trusted instruction — part of inspecting every agent surface.

FAQ

Is RAG poisoning the same as prompt injection? It's a specific, high-impact form of indirect prompt injection — the malicious instruction arrives through retrieval rather than the user's prompt.

Does scanning the user's prompt stop it? No. The attack bypasses prompt-level filtering entirely, because the instruction comes from a retrieved document. You have to inspect the retrieved context itself.

See how TrustGate secures every agent surface.