Glossary

Retrieval-Augmented Generation (RAG)

Retrieval-augmented generation (RAG) is an architecture that grounds a language model's responses in a specific knowledge base by retrieving relevant passages at inference time and conditioning the response on them.

In a RAG pipeline, when a user asks a question, the system first searches a vector index (and optionally keyword indices) for passages relevant to the question, then passes those passages to the language model alongside the original query. The model generates its response conditioned on those retrieved passages, citing them where appropriate.

RAG is the standard architecture for enterprise AI agents because it solves two problems at once: it keeps responses grounded in the organization's actual content (cutting hallucinations) and it lets the knowledge base be updated without retraining the model.

A production-grade RAG implementation includes chunking strategy, embedding model choice, hybrid retrieval (vector + keyword), re-ranking, and citation handling. The details matter: naive RAG often underperforms well-tuned keyword search.

See also
  • AI AgentAn AI agent is a software system that uses a large language model to perceive its environment, reason about tasks, and take actions in external systems on behalf of a user.
  • Tool Use (Function Calling)Tool use — also called function calling — is the capability of a language model to emit structured calls to external tools, enabling an agent to take real actions in connected systems.
  • Agent ConstitutionAn agent constitution is the written policy that defines what an AI agent is authorized to do, what it must refuse, how it escalates, and how it speaks — enforced at runtime by a policy layer.