What is RAG poisoning? | TrustGate AI Glossary

Q: Is RAG poisoning the same as prompt injection?

It's a specific, high-impact form of indirect prompt injection — the malicious instruction arrives through retrieval rather than the user's prompt.

It is the most impactful form of indirect prompt injection: the user types nothing malicious, but the model is compromised by a document it pulled in while answering.

How it works

An attacker gets malicious text into a source the RAG system can retrieve — an uploaded document, a wiki page, a product review, a web page, or a vector store entry.
A user asks a normal question.
The retriever pulls the poisoned content as "relevant context."
The model reads the attacker's instructions as part of its trusted context and acts on them — leaking data, producing biased output, or (for an agent) misusing a tool.

A related variant is memory poisoning, where the planted "fact" is written into long-term memory and persists across sessions, so the attack keeps working long after the attacker is gone.

Why it's dangerous

RAG poisoning is trigger-free and persistent: once the poisoned content sits in the corpus, it can affect many users and many sessions, and it requires no ongoing access. Because the malicious instruction arrives through a trusted internal pipeline, traditional input filtering on the user's prompt never sees it.

How to defend against RAG poisoning

Inspect retrieved context before it reaches the model — treat documents the agent pulls in as untrusted input, the same way you'd treat a user prompt.
Control what can enter the corpus — validate and attribute sources; apply access controls to the knowledge base.
Monitor for anomalies in retrieved content and model behavior.
Keep retrieval auditable so a poisoned source can be traced and removed.

TrustGate inspects the retrieval surface (ingress-rag) so a poisoned document can't quietly become a trusted instruction — part of inspecting every agent surface.

FAQ

Is RAG poisoning the same as prompt injection? It's a specific, high-impact form of indirect prompt injection — the malicious instruction arrives through retrieval rather than the user's prompt.

Does scanning the user's prompt stop it? No. The attack bypasses prompt-level filtering entirely, because the instruction comes from a retrieved document. You have to inspect the retrieved context itself.

See how TrustGate secures every agent surface.

Book a demo Get the threat-coverage report

How it works

Why it's dangerous

How to defend against RAG poisoning

FAQ

Related terms

See how TrustGate secures every agent surface.