What keeps a fleet's shared memory from rotting at scale — automatic contradiction detection and supersession, an 8-state lifecycle, the crystallizer hygiene scan, and memclaw_insights.

Part 1 gave the fleet shared memory; Part 2 gave us eyes on it; Part 3 governed it; Part 4 made it learn from outcomes.

Every one of those parts adds memories. Decisions get revised. Facts go stale. Two agents write the same thing three different ways. A store that only grows is a store that slowly rots — and recall quality rots with it. MemClaw's answer is three mechanisms that run mostly on their own: contradiction detection (which flips a stale memory's status and links the one that replaced it), an explicit status lifecycle, and a crystallizer that scans the corpus for hygiene problems and merges near-duplicates into canonical facts. Then memclaw_insights lets an agent reflect over the whole store and write down what it notices.

Commands use the default http://localhost:8000 + X-API-Key: standalone from Part 1. Everything below was run against the live fleet from Parts 1–4; the IDs are real.

Step 1 — Contradiction detection & supersession

Say the fleet holds a single-value fact about the payments API's rate limit. So the example is self-contained, write the baseline fact (if you ran Part 1 you already have an equivalent decision — this just makes the pair explicit):

curl -X POST http://localhost:8000/api/v1/memories \
  -H "X-API-Key: standalone" -H "X-Agent-ID: backend-dev" -H "Content-Type: application/json" \
  -d '{"tenant_id":"default","agent_id":"backend-dev","fleet_id":"dev-fleet","scope":"fleet",
       "content":"The payments API rate limit is 100 requests per minute per API key."}'
#   → {"id":"<OLD-id>","status":"confirmed",...}

Now a newer fact contradicts it. Phrasing matters here. Write the new memory as a direct, present-tense restatement of the same fact — "the rate limit is 300…" — not as an event like "raised to 300, replacing the previous limit, after the rollout." The judge reads "raised to / replacing" as a temporal update (both values were true in sequence, so not a contradiction) and leaves the old one untouched; a direct restatement is an incompatible claim about the same subject, which is a contradiction:

curl -X POST http://localhost:8000/api/v1/memories \
  -H "X-API-Key: standalone" -H "X-Agent-ID: backend-dev" -H "Content-Type: application/json" \
  -d '{"tenant_id":"default","agent_id":"backend-dev","fleet_id":"dev-fleet","scope":"fleet",
       "content":"The payments API rate limit is 300 requests per minute per API key."}'
#   → {"id":"<NEW-id>","status":"confirmed","supersedes_id":null}

The write returns immediately with supersedes_id: null — that's expected. Contradiction detection runs after the commit, as a background task, so the write path stays fast. A few seconds later, re-fetch both memories:

#   OLD  <OLD-id>  status=conflicted   ← was "confirmed"
#   NEW  <NEW-id>  status=confirmed     supersedes_id=<OLD-id>

Two things happened on their own:

The older memory was flipped to conflicted — it's no longer presented as current truth.
The newer memory now points back at it via supersedes_id, forming a supersession chain you can walk. The fleet keeps the history (the old decision isn't deleted — it's demoted), but recall surfaces the live one.

MemClaw decides this two ways. For single-value facts with a known subject/predicate (an entity's "rate limit," a service's "owner"), it compares RDF triples structurally — same subject + same single-value predicate + different object → conflict, and the older memory is marked outdated. For everything else it uses the semantic path: it pulls candidates above a cosine-similarity floor (0.70) and asks an LLM judge whether they genuinely conflict; a confirmed semantic conflict marks the older memory conflicted (which is the path our prose-y decision took). Either way the new memory gets the supersedes_id link.

Step 2 — The status lifecycle

conflicted and outdated aren't ad-hoc strings — they're two of eight statuses every memory moves through:

Status	Meaning
`active`	Normal, recallable.
`pending`	Written but not yet confirmed.
`confirmed`	Verified / reinforced — the high-confidence tier.
`cancelled`	Explicitly retracted by an agent.
`outdated`	Superseded via the RDF (single-value) contradiction path.
`conflicted`	Superseded via the semantic contradiction path.
`archived`	Retired by lifecycle automation or the crystallizer.
`deleted`	Soft-deleted (the dedup gate still filters these out).

Most transitions are automatic — contradiction detection sets outdated/conflicted, lifecycle automation archives stale, low-weight memories. When an agent needs to move one by hand (retract a cancelled task, confirm a pending fact), it's the MCP memclaw_manage tool with op=transition (memory_id + target status). The point is that "stale" isn't a vibe — it's a state, and the state is what recall and the hygiene tools key off.

Step 3 — The crystallizer: a hygiene scan for the whole corpus

Contradiction detection is per-write. The crystallizer is the periodic sweep that looks at everything at once. Trigger a run for the tenant:

curl -X POST http://localhost:8000/api/v1/crystallize \
  -H "X-API-Key: standalone" -H "Content-Type: application/json" \
  -d '{"tenant_id":"default"}'
#   → {"report_id":"5be5a05a-...","status":"running"}

It runs in the background and writes a report. Fetch the latest:

curl "http://localhost:8000/api/v1/crystallize/latest?tenant_id=default" -H "X-API-Key: standalone"

{
  "status": "completed", "duration_ms": 170,
  "summary": { "overall_score": 99, "critical": 0, "warning": 0, "info": 1 },
  "health": {
    "total_memories": 18, "embedding_coverage_pct": 100.0,
    "entity_coverage_pct": 94.4, "pii_count": 2, "contradiction_count": 1,
    "status_distribution": { "active": 8, "pending": 1, "confirmed": 8, "conflicted": 1 }
  },
  "hygiene": {
    "near_duplicates": { "count": 0 }, "stale_memories": { "count": 0 },
    "orphaned_entities": { "count": 0 }, "missing_embeddings": { "count": 0 },
    "broken_entity_links": { "count": 0 }, "expired_still_active": { "count": 0 }
  },
  "issues": [
    { "severity": "info", "code": "CONTRADICTIONS_PRESENT",
      "title": "Contradicted or outdated memories present", "count": 1 }
  ]
}

One report, three lenses: health (coverage, PII count, the live status histogram — note our conflicted: 1 from Step 1), hygiene (seven structural checks: near-duplicates, stale, orphaned entities, missing embeddings, broken links, expired-still-active, short content), and a graded issues list with affected IDs. The contradiction we created surfaces as the single info issue, and the corpus scores 99/100.

Near-duplicate merging. When the scan finds memories that say the same thing, it reports them as pairs above a strict 0.95 cosine threshold and groups them into clusters (≥3 members). Seed three near-identical facts and re-scan and the report shows it:

"near_duplicates": { "count": 2, "pairs": [
  { "similarity": 0.9925, "id1": "626319b9", "id2": "c4dca263" },
  { "similarity": 0.9565, "id1": "626319b9", "id2": "f6413f5c" }
]},
"crystallization": { "enabled": true, "clusters_found": 1, "memories_crystallized": 0 }

When crystallization is enabled, the engine hands each cluster to an LLM that distills it into one canonical memory (agent_id: "crystallizer", with metadata.crystallized_from listing the sources for provenance) and archives the originals. So three ways of saying "CI caches node_modules" collapse to one fact, and the history is still traceable. The scan is the durable, always-safe part — it tells you exactly what's wrong and never destroys anything; the merge is the enabled cleanup that acts on it.

Incremental by design. Near-duplicate detection marks each memory dedup-checked once it's scanned, so a second /crystallize won't re-flag the same pairs — near_duplicates drops back to 0. Run the scan right after seeding new memories to see them, and read each report as "new findings since the last sweep," not a full re-scan.

Step 4 — `memclaw_insights`: let the fleet reflect

The crystallizer finds structural problems. memclaw_insights finds semantic ones — it points an LLM at a slice of memory and asks "what should I notice here?" There are six focus modes:

Focus	What it surfaces
`contradictions`	Conflicting / superseded memories and RDF divergence.
`failures`	Low-weight memories that were still recalled (things that burned the fleet).
`stale`	Memories never recalled, or decayed and untouched.
`divergence`	One entity described differently by multiple agents (fleet/all scope only).
`patterns`	Themes across recent active memory.
`discover`	Embedding clusters — latent topics you didn't ask about.

Point it at our fresh conflict:

curl -X POST http://localhost:8000/api/v1/insights/generate \
  -H "X-API-Key: standalone" -H "X-Agent-ID: backend-dev" -H "Content-Type: application/json" \
  -d '{"tenant_id":"default","agent_id":"backend-dev","focus":"contradictions","scope":"fleet","fleet_id":"dev-fleet"}'

{
  "summary": "The two memories directly conflict on the Payments API rate limit per API key: one claims 100 req/min, the other 300 req/min...",
  "findings": [{
    "type": "contradictions", "confidence": 0.93,
    "title": "Payments API rate limit: 100 vs 300 req/min per key",
    "recommendation": "Mark Memory 2 as superseded/obsolete... ensure the limiter config reflects 300.",
    "related_memory_ids": ["21959ba7", "bbd5ddfe"]
  }]
}

It found exactly the right pair, at 0.93 confidence, with a concrete recommendation. And the payoff: the finding is itself written back as an insight-type memory (65f41ab9 · "Resolve Payments API rate-limit conflict (100 vs 300 req/min)"). Reflection compounds — the next agent that recalls "rate limit" gets the conflict and the note that someone already flagged it.

A few notes from running it:

agent_id goes in the body. The insights route resolves the caller from the request body; without it you'll hit "Agent 'mcp-agent' is not registered."
Scope gates trust. agent scope needs trust ≥ 1, fleet/all need trust ≥ 2; divergence only runs at fleet/all (it's inherently cross-agent). Standalone's admin key bypasses the gate locally.
Honest empties. focus=stale on our young corpus returned "No relevant memories found" — it doesn't invent findings to look busy.

Step 5 — Make hygiene a background habit

None of this should be a thing a human remembers to do. Two moves close the loop:

Schedule the crystallizer. A nightly POST /api/v1/crystallize per tenant turns the hygiene report into a trend you can watch in the dashboard from Part 2 — overall_score over time is a single number for "is our memory healthy?"
Give one agent a reflection habit. A line in a CLAUDE.md, the same way Part 4 added outcome reporting:

- Periodically run memclaw_insights (focus=contradictions, then stale) and act on
  high-confidence findings; the findings persist as insight memories for the fleet.

Contradiction detection fires on every write, the crystallizer sweeps on a timer, and insights let an agent reflect on demand. Together they're the reason a fleet's memory can grow to 20,000 entries and still answer like it has 200 — current facts on top, stale ones demoted but traceable, duplicates collapsed, conflicts flagged and written down. A store that only grows rots. A store that grooms itself compounds.

Next — Part 6: The knowledge graph: entities, relations, and how the graph quietly lifts every recall — paying off the graph view we built back in Part 2.

caura-memclaw · Apache 2.0 — contradiction detection, the status lifecycle, the crystallizer (/crystallize) and memclaw_insights are all in the open-source engine. ⭐ Star on GitHub · Join Discord · memclaw.net

A fleet that learns is an asset. A fleet that keeps itself clean is one you can still trust at scale.

Memory hygiene at scale: contradictions, supersession & the crystallizer

Step 1 — Contradiction detection & supersession

Step 2 — The status lifecycle

Step 3 — The crystallizer: a hygiene scan for the whole corpus

Step 4 — `memclaw_insights`: let the fleet reflect

Step 5 — Make hygiene a background habit

On this page

Memory hygiene at scale: contradictions, supersession & the crystallizer

Step 1 — Contradiction detection & supersession

Step 2 — The status lifecycle

Step 3 — The crystallizer: a hygiene scan for the whole corpus

Step 4 — memclaw_insights: let the fleet reflect

Step 5 — Make hygiene a background habit

On this page

Step 4 — `memclaw_insights`: let the fleet reflect