It is the most impactful form of indirect prompt injection: the user types nothing malicious, but the model is compromised by a document it pulled in while answering.
How it works
- An attacker gets malicious text into a source the RAG system can retrieve — an uploaded document, a wiki page, a product review, a web page, or a vector store entry.
- A user asks a normal question.
- The retriever pulls the poisoned content as "relevant context."
- The model reads the attacker's instructions as part of its trusted context and acts on them — leaking data, producing biased output, or (for an agent) misusing a tool.
A related variant is memory poisoning, where the planted "fact" is written into long-term memory and persists across sessions, so the attack keeps working long after the attacker is gone.
Why it's dangerous
RAG poisoning is trigger-free and persistent: once the poisoned content sits in the corpus, it can affect many users and many sessions, and it requires no ongoing access. Because the malicious instruction arrives through a trusted internal pipeline, traditional input filtering on the user's prompt never sees it.
How to defend against RAG poisoning
- Inspect retrieved context before it reaches the model — treat documents the agent pulls in as untrusted input, the same way you'd treat a user prompt.
- Control what can enter the corpus — validate and attribute sources; apply access controls to the knowledge base.
- Monitor for anomalies in retrieved content and model behavior.
- Keep retrieval auditable so a poisoned source can be traced and removed.
TrustGate inspects the retrieval surface (ingress-rag) so a poisoned document can't quietly become a trusted instruction — part of inspecting every agent surface.
FAQ
Is RAG poisoning the same as prompt injection? It's a specific, high-impact form of indirect prompt injection — the malicious instruction arrives through retrieval rather than the user's prompt.
Does scanning the user's prompt stop it? No. The attack bypasses prompt-level filtering entirely, because the instruction comes from a retrieved document. You have to inspect the retrieved context itself.