MemClaw / docs
Tutorials

The Karpathy Loop: memory that learns from outcomes

Make shared memory improve with use — report outcomes, reinforce what works, auto-generate rules from failures, and tune retrieval per agent.

Part 1 gave the fleet shared memory; Part 2 gave us eyes on it; Part 3 governed it. All three treat memory as a store — something you write to and read from.

This part makes it a system that learns. After an agent acts on a recalled memory, it reports the outcome — success, failure, or partial. Successful memories gain weight (and rank higher next time); failures lose weight and auto-generate a preventive rule so the fleet doesn't repeat the mistake. The project calls this the Karpathy Loop: write → recall → act → report → better recall.

Commands use the default http://localhost:8000 + X-API-Key: standalone from Part 1. memclaw_evolve and memclaw_tune are the agent-native MCP tools; the REST calls (/evolve/report, /agents/{id}/tune) make the behavior visible.


Step 1 — Report an outcome

When an agent finishes acting on memories it recalled, it calls memclaw_evolve with the outcome_type (success | failure | partial), a natural-language description, and the related_ids — the memories that influenced the action.

curl -X POST http://localhost:8000/api/v1/evolve/report \
  -H "X-API-Key: standalone" -H "Content-Type: application/json" \
  -d '{"tenant_id":"default","agent_id":"backend-dev","fleet_id":"dev-fleet",
       "scope":"fleet","outcome_type":"success",
       "related_ids":["<memory-id>"],
       "outcome":"Shipped the payments rate limiter; held up under load tests with no false throttling."}'

Every report does three things: adjusts the weight of each related memory, records an outcome memory (so the report itself is auditable), and — on failure — may emit a rule.


Step 2 — Success reinforces

A success nudges each related memory's weight up by +0.1 (capped at 1.0):

#   → "weight_adjustments":[{"memory_id":"...","old_weight":0.9,"new_weight":1.0,"delta":0.1}]
#     "rules_generated":[]

No rule on success — correct; you only want preventive rules from things that went wrong. The reinforced memory now ranks higher in future recalls, so what worked surfaces first.


Step 3 — Failure penalizes and writes a rule

This is the heart of the loop. Report a failure against the same decision:

curl -X POST http://localhost:8000/api/v1/evolve/report \
  -H "X-API-Key: standalone" -H "Content-Type: application/json" \
  -d '{"tenant_id":"default","agent_id":"backend-dev","fleet_id":"dev-fleet",
       "scope":"fleet","outcome_type":"failure","related_ids":["<decision-id>"],
       "outcome":"Shipped the 100 rpm sliding-window limiter to prod; it caused false throttling for legitimate batch API clients during nightly jobs and we rolled it back. The per-key window is too blunt for bursty batch traffic."}'

Two things happen:

{
  "outcome_type": "failure",
  "weight_adjustments": [
    { "memory_id": "...", "old_weight": 1.0, "new_weight": 0.85, "delta": -0.15 }
  ],
  "rules_generated": [
    { "rule_memory_id": "...", "confidence": 0.78,
      "condition": "IF a per-key rate limiter is a fixed-size sliding-window applied to bursty traffic (nightly batch jobs, backfills) where legitimate clients exceed the average rate in short intervals",
      "action": "THEN use a burst-tolerant strategy (token-bucket / leaky-bucket, or a sustained-rate + burst-allowance tier); load-test with realistic batch patterns and roll out as a canary before full prod." } ]
}
  1. The weight drops by −0.15 — note the asymmetry: failures (−0.15) cost more than successes earn (+0.1), so a memory that burns the fleet decays faster than one good outcome can prop it up. (Partial outcomes nudge +0.03; weights floor at 0.05 and cap at 1.0.)
  2. A preventive rule memory is generated (above a confidence threshold) — a structured IF … THEN … the fleet will now recall. It's a normal scope_team memory of type rule, titled e.g. "Burst-tolerant rate limiting for bursty per-key traffic."

So the next time any agent recalls "rate limiting," it gets both the original decision (now down-weighted) and the rule that says don't do the blunt version for bursty traffic. The fleet learned from one failure, once.


Step 4 — Per-agent retrieval tuning

Different agents want different recall. A reviewer wants precision (few, high-confidence hits); a docs writer wants breadth (more results, more graph context). memclaw_tune lets each agent shape its own retrieval profile — top_k, min_similarity, fts_weight (semantic↔keyword), graph_max_hops, freshness decay, and recall-boost knobs.

# code-reviewer dials in precision
curl -X PATCH "http://localhost:8000/api/v1/agents/code-reviewer/tune?tenant_id=default" \
  -H "X-API-Key: standalone" -H "X-Agent-ID: code-reviewer" -H "Content-Type: application/json" \
  -d '{"min_similarity":0.55,"top_k":5}'
#   → search_profile: {"top_k":5,"min_similarity":0.55}

# docs-writer dials in breadth
curl -X PATCH "http://localhost:8000/api/v1/agents/docs-writer/tune?tenant_id=default" \
  -H "X-API-Key: standalone" -H "X-Agent-ID: docs-writer" -H "Content-Type: application/json" \
  -d '{"graph_max_hops":2,"top_k":15}'
#   → search_profile: {"top_k":15,"graph_max_hops":2}

Each agent's profile is applied to its recalls, so search quality compounds per agent, per feedback signal. A few notes:

  • An agent tunes its own profile — X-Agent-ID identifies the caller; tuning a peer is blocked (except for admin/operator keys). In standalone the standalone key is admin-equivalent, so that self-only guard is bypassed locally; it bites once each agent has its own credential behind the gateway.
  • memclaw_tune is also the agent-native MCP tool; GET /agents/{id}/tune reads the current profile.

Step 5 — Close the loop

Make reporting a habit, not an afterthought. One line in each agent's CLAUDE.md from Part 1:

- After acting on recalled memories, report the outcome with memclaw_evolve
  (success / failure / partial) and the memory IDs that influenced the action.

Now the flywheel turns on its own:

write → recall → act → report → better recall

Every interaction leaves the memory a little smarter than it found it. Successes float to the top; failures become guardrails; each agent's retrieval sharpens to its job. A store you only read and write is static. A store that learns from outcomes compounds — which is the whole reason to give a fleet shared memory in the first place.

Next — Part 5: Memory hygiene at scale: what keeps 20,000 accumulating memories from becoming a swamp — contradiction detection, supersession, and the crystallizer.


caura-memclaw · Apache 2.0 — the Karpathy Loop (evolve) and per-agent tuning (tune) are in the open-source engine. ⭐ Star on GitHub · Join Discord · memclaw.net

A fleet that shares is useful. A fleet that learns is an asset.