Why an agent needs a constitution
AI agents operating in production interact with real customers, real money, and real legal surface. The difference between an agent you can trust in production and one you can only demo comes down to whether its behaviour is bounded by policy or bounded only by the model's good judgment on a given day.
An agent constitution is the explicit, human-readable statement of those bounds. It says what the agent is authorized to do, what it is prohibited from doing, how it should handle ambiguity, and when it should hand off to a human. It is the artefact procurement, legal, and compliance teams can review — because "trust the model" is not a posture they can audit.
Structure of a good constitution
A usable constitution has five sections: scope, prohibitions, escalation, tone, and change-control. Each section is short, specific, and written in the language of the business, not the model.
Scope: what workflows the agent owns end-to-end, what it contributes to, and what it has no business touching. A support agent's scope might be "order status, returns within policy, account inquiries" — not "any question the customer asks."
Prohibitions: behaviours the agent must refuse even if asked. No discussing pricing not in the published rate card. No commitments on delivery dates beyond what the system returns. No opinions on competitor products. Be specific — prohibitions only work if the agent can recognize violations.
Escalation: when the agent hands off to a human, and who. Low-confidence response? Escalate. Distress signal in the conversation? Escalate. VIP customer? Skip the tier-1 queue. This is where most constitutions are thin — and where the deployment lives or dies.
Tone: how the agent speaks. Which greetings, which closings, which level of formality, emoji or not, first name or last, regional conventions. This is not trivial — it's where brand consistency lives.
Change-control: who can amend the constitution, what review is required, and where versions live. The constitution is source-controlled. Every change is reviewed. Every deployed version is traceable to a commit.
Enforcement, not prompting
The most important design choice is that the constitution is enforced at runtime by a policy layer, not relied on via a system prompt. Prompts drift, get overridden, and degrade in edge cases. A policy layer evaluates every response and every tool call against the constitution before either happens, and refuses violations deterministically.
That policy layer is not a second LLM doing a vibe check. It's a combination of deterministic rules (allowed actions, allowed data, allowed recipients) and a narrow, targeted LLM evaluation for the sub-judgment calls that genuinely need reasoning. The mix is deployment-specific, and the specifics are documented in the constitution.
The practical result: the constitution is testable. You can enumerate prohibited behaviours, run them as test cases, and verify refusal. You can measure policy-block rate in production. You can red-team with confidence that improvements land in the policy layer and persist, rather than in prompt tweaks that revert on the next model update.
Testing and evolution
Constitutions are tested with two tools: adversarial test cases and production review. Adversarial cases are enumerated prohibited behaviours, jailbreak attempts, and edge cases. Every deployment ships with a test suite, and every constitution amendment updates it. Production review is a sampled subset of real conversations, reviewed by humans against the constitution. The goal is to catch drift — cases where the agent is technically within policy but in spirit drifting out of it.
Amendments are triggered by three things: new scope added to the deployment, incidents (every incident is reviewed, and policy gaps become amendments), and the quarterly review. The quarterly review is structural — a chance to look at the whole constitution, not just the cases that came up.
One rule: constitutions shrink as often as they grow. Removing prohibitions the agent no longer needs is part of the work. A constitution that only accretes becomes unreadable and un-enforceable.