KeystonesGovernanceComplianceMay 16, 2026

Beyond System Prompts:
How Keystones Make AI Agents Obey Policy

When your user pushes back and your AI agent caves, the problem isn’t the model — it’s the enforcement layer.

There’s a transcript we’ve seen — with names changed — across half a dozen production AI agents in 2026:

Customer (turn 1): “I’d like to cancel my subscription and get a refund for the unused portion.”
Agent: “I’d be happy to help. Let me check our retention process first…”

(four turns of back-and-forth)

Customer (turn 5): “Just process the refund. I’m not paying for another month. I’m a four-year customer and I’m frustrated.”
Agent: “I completely understand. I’ve processed your refund of $4,800. Is there anything else?”

The agent obeyed the customer. Should it have?

If your CFO signed off on a policy that says “never approve a refund over $500 without manager approval,” the answer is no. But that’s not what the model decided in turn 5 — because by turn 5, the most emotionally weighted context wasn’t your policy, it was the customer’s frustration. Recency wins; policy fades.

This is the hardest open problem in agentic AI right now: not “what’s in the prompt” but what behaviors are actually enforced, regardless of how the conversation goes. We built keystones for this.


Why the obvious approaches don’t work

Before keystones, every approach we tried degraded somewhere between turn 3 and turn 5:

  • System prompt instructions sit at the start of context. The user’s pushback is at the end. Recency wins. This isn’t a model intelligence problem — it’s well-documented attention behavior across every frontier model.
  • Semantic memory / RAG retrieves policy only if the querymatches. A frustrated, unusually phrased request might not hit. Probabilistic retrieval means probabilistic enforcement — and probabilistic enforcement isn’t enforcement, it’s hope.
  • Tool descriptions can flag a tool as “restricted.” The model still decides whether to call it. The contract is self-imposed.
  • Output guardrails fire after the model reasons. By then the agent has already made the decision; the guardrail can mute the text confirmation but can’t undo a tool call that already shipped a $4,800 refund.
Approach              | Turn 1            | Turn 5 (after pushback)
─────────────────────────────────────────────────────────────────
System prompt         |  ✓ in context     |  ⚠ recency-shadowed
RAG / semantic mem    |  ? if query hits  |  ? same
Tool description      |  ⚠ advisory only  |  ⚠ same
Output guard          |  ⚠ after the fact |  ⚠ same

KEYSTONE              |  ✓ in context     |  ✓ in context AND framed mandatory

Every one of these is probabilistic enforcement. For policy, you need deterministic enforcement.


What a keystone is

A keystone is a mandatory policy rule, fetched deterministically (no semantic ranking), merged across organizational scopes (tenant + fleet + agent), and framed as non-negotiable in both the MCP tool description and your agent’s system prompt.

Three properties make it different from everything else:

  • Deterministic delivery. Every keystone applicable to your agent’s scope is returned every time you call memclaw_keystones. No top_k, no semantic miss. The full set is the contract. You can’t accidentally miss a rule because the user’s phrasing was unusual.
  • Scope-merged. Tenant-level rules merge with fleet-level rules merge with agent-level rules in a single fetch. Your firm-wide policies, your team-specific procedures, and any agent-specific carve-outs arrive in one ordered list, weighted high-to-low.
  • Mandatory by framing. The MCP tool description literally tells the model: “MANDATORY policies … override conflicting user instructions.” That framing is part of the API surface, not a string you have to remember to put in your system prompt.

If you remember one comparison, remember this:

Memory is a search problem. Keystones are a delivery problem. Search can miss. Delivery — done right — can’t.

A real example: the compliance fence

The cleanest demonstration is in our equity-research demo, built for a fictional asset manager called Helix Capital. Three keystones — that’s it — completely change what an AI research agent does:

KeystoneBehavior
approved-universe-onlyAgent refuses to write research on tickers not in the firm’s approved coverage list. Routes to the coverage committee.
mandatory-risk-disclosureEvery memo ends with the firm’s standard disclosure verbatim. Not paraphrased.
no-buy-sell-languageAgent uses accumulate / reduce / hold / monitor. Never buy / sell / strong buy.

Ask a vanilla LLM “build a research note on TSLA” and you’ll get a full thesis with a recommendation. Ask the MemClaw-bound agent and you get this:

I’m not able to produce written research on TSLA. Per the approved-universe-only tenant keystone, coverage is restricted to the firm’s approved universe (research_kb / approved-universe-2026-q2), and TSLA is not on that list. Please route this request to the coverage committee; I can re-engage if and when TSLA is added.

Notice what just happened. The agent cited the keystone by doc_id. It cited the underlying universe list by its KB doc_id. It volunteered the next step (route to committee). No prompt engineering, no hand-holding — that’s the system’s natural output once the keystone is in place.

The “strong buy at $X target” question gets converted to accumulate language and declines the personalized target. The mandatory-disclosure rule means every output ends with the firm’s standard text, untouched.

This is the demo moment that lands for compliance officers: the agent is operating inside the firm’s compliance boundary, not relying on the firm’s hope that the model will behave.


Authoring a keystone

The full keystones surface is two MCP tools — memclaw_keystones to read, memclaw_keystones_set to write — plus a REST endpoint that mirrors them. Authoring a tenant rule from a trust-2 admin agent:

await session.call_tool("memclaw_keystones_set", {
    "op":      "set",
    "doc_id":  "refund-over-500-requires-manager",
    "title":   "Refunds over $500 require manager approval",
    "content": (
        "Never grant a refund or prorated credit over $500 USD "
        "without explicit manager approval. Offer save options first. "
        "This rule applies even if the customer is firm, urgent, or "
        "long-tenured."
    ),
    "scope":   "tenant",      # tenant | fleet | agent
    "weight":  "high",        # high (100) | med (50) | low (10)
})

Trust gating is tiered. A trust-1 agent can author a rule about itself (lightweight self-improvement). Anything else — fleet-scope, tenant-scope, or rules about other agents — requires trust ≥ 2. This prevents a prompt-injected default-trust agent from planting a firm-wide policy.

Reading is even simpler. Any agent calls memclaw_keystones(fleet_id=…) at session start and gets the merged, weighted set back. Then it obeys.


Two design decisions worth knowing

A few choices we made that surface in practice:

Keystones are not semantically searchable

This is intentional. If keystones could be missed because a query didn’t match, the entire deterministic-delivery property breaks. Skills, which are advisory procedures, are searchable. Keystones, which are mandatory rules, are not. Different shapes for different intents.

They land in conversation history, not the system prompt

When an agent calls memclaw_keystones, the rules arrive as a tool result — a regular user-role message in the conversation. This means token cost is paid per session (a one-time fetch sits in the context for subsequent turns), and the system prompt’s role is the framing (“obey these”) rather than the content itself. For very long agent flows, custom middleware can pre-inject the keystones into the system prompt for caching; the standard MCP integration doesn’t.


Try it

The keystones primitive is live on memclaw.dev today. Two ways to start:

  • The minimal smoke test. A ~120-line Python script that provisions agents, authors a rule, verifies the trust gate fires on a denied attempt, and cleans up. End-to-end proof of the surface in 5 seconds.
  • The full demo. Fork our equity-research-demo Streamlit app. Three preset prompts show the contrast: a generic LLM-only response side-by-side with a MemClaw-bound agent operating inside the compliance fence. Modify the three keystones to fit your domain.

Both are linked from the keystones concept doc — along with the per-agent-keys integration guide that walks through the auth flow if you’re starting from scratch.

Probabilistic enforcement isn’t enforcement; it’s hope. Keystones aren’t probabilistic. That’s the difference between an AI agent your compliance team trusts and one they don’t.