Coding AgentsCode ReviewGovernance

The Code Reviewer That Remembers Why

June 16, 2026 · Caura.AI

Most code-review bots see your repository fresh every single time. They apply generic best practices, re-flag the convention you settled three sprints ago, and — the expensive one — can’t tell a deliberate decision from a mistake. A fresh-eyes reviewer is useful exactly once. After that, the value is in remembering.

We’ve been building a PR-review agent for our Claude Code fleet that does remember — both the rules it has to enforce and the reasoning behind the code it’s reading. Same models, same diff. The difference is what it walks in knowing.

A new pull request arrives at the review agent. The agent draws on memclaw, which holds the team's operating rules (never auto-merge, required sign-off, branch naming) and the design decisions behind the code, and reviews the PR against them — producing a consistent verdict that applies the same standards every PR, upholds settled decisions, and batches severity-aware fixes.
Fig 1Every review opens by recalling two things from shared memory: the rules the team enforces, and the decisions behind the code. The verdict is the same standard, applied every time.

It already knows your rules

At the start of every review, the agent recalls the team’s operating rules and enforces them without being re-briefed:

  • Never auto-merge — it surfaces a PR as “ready,” a human clicks merge.
  • The branch-naming scheme CI enforces (^CAURA-[0-9]+(-[a-z0-9]+)+$).
  • DCO sign-off on every commit (git commit -s).
  • Run /simplify before commit; one commit per PR (amend + force-with-lease).
  • Migrations only via the storage-service lifespan; all database access through the storage API.
  • Top-level imports only.

These aren’t pasted into a prompt each session. They live as memclaw rule memories, so the agent loads them at the start of every run and obeys them. They’re strong enough that they override a conflicting request: ask it to merge and it won’t — it reports that the PR is ready and leaves the click to a human, because the rule says so.

It remembers why you made the call

This is the part a fresh-eyes reviewer can’t do. During a review, a suggestion came in to label an audit event as pii_mask — a redaction — even though no redaction had actually happened (that path runs through the LLM and has no span offsets to redact). On the surface it looks like a reasonable label.

The agent rejected it, because it remembered the principle behind the code: audit verbs must describe what physically occurred, never what was merely intended. Recording a mask that never ran would make the audit trail lie. Instead it kept the verb truthful and recorded the configured intent in a separate detail field — and then applied that same reasoning consistently across later reviews. A reviewer that remembers why you made a call is a fundamentally different teammate than one meeting your repo fresh, every single time.

Same standards, every PR

Because the rules and decisions come from memory rather than from whoever happened to brief the agent that day, every PR gets the same treatment: standards applied uniformly, fixes batched rather than dribbled out comment by comment, severity-aware handling, addressed comments minimized. It’s the canonical review cycle — the same one, every time — and outcomes feed back, so the memory sharpens with use.

Hope vs. a standard that holds

“Please follow our conventions” in a system prompt is probabilistic. It works until the model is nudged, the context fills up, or the request is phrased persuasively enough — and then the standard quietly bends. An enforcement-plus-memory layer is different: the rule is recalled deterministically and held even when an agent is asked to break it. That’s the same idea behind keystones — policy that doesn’t depend on the model staying in a good mood.

And it’s the same loop our error-investigation agent runs from the other side: trace a failure once, write the lesson down, never re-debug it. We told that story in Passes Locally, Fails in CI. Recall, act, write the outcome back — shared across sessions and agents, and superseded when it goes stale.

How much of your team’s reasoning does your reviewer actually remember?

Try it

OSS: github.com/caura-ai/caura-memclaw. Apache 2.0, docker compose up, REST and MCP out of the box.

Managed: memclaw.net. Same engine, governance wired in, we run it.