The Karpathy Loop: memory that learns from outcomes
Make shared memory improve with use — report outcomes, reinforce what works, auto-generate rules from failures, and tune retrieval per agent.
Part 1 gave the fleet shared memory; Part 2 gave us eyes on it; Part 3 governed it. All three treat memory as a store — something you write to and read from.
This part makes it a system that learns. After an agent acts on a recalled memory, it reports the outcome — success, failure, or partial. Successful memories gain weight (and rank higher next time); failures lose weight and auto-generate a preventive rule so the fleet doesn't repeat the mistake. The project calls this the Karpathy Loop: write → recall → act → report → better recall.
Commands use the default
http://localhost:8000+X-API-Key: standalonefrom Part 1.memclaw_evolveandmemclaw_tuneare the agent-native MCP tools; the REST calls (/evolve/report,/agents/{id}/tune) make the behavior visible.
Step 1 — Report an outcome
When an agent finishes acting on memories it recalled, it calls memclaw_evolve with the outcome_type (success | failure | partial), a natural-language description, and the related_ids — the memories that influenced the action.
curl -X POST http://localhost:8000/api/v1/evolve/report \
-H "X-API-Key: standalone" -H "Content-Type: application/json" \
-d '{"tenant_id":"default","agent_id":"backend-dev","fleet_id":"dev-fleet",
"scope":"fleet","outcome_type":"success",
"related_ids":["<memory-id>"],
"outcome":"Shipped the payments rate limiter; held up under load tests with no false throttling."}'Every report does three things: adjusts the weight of each related memory, records an outcome memory (so the report itself is auditable), and — on failure — may emit a rule.
Step 2 — Success reinforces
A success nudges each related memory's weight up by +0.1 (capped at 1.0):
# → "weight_adjustments":[{"memory_id":"...","old_weight":0.9,"new_weight":1.0,"delta":0.1}]
# "rules_generated":[]No rule on success — correct; you only want preventive rules from things that went wrong. The reinforced memory now ranks higher in future recalls, so what worked surfaces first.
Step 3 — Failure penalizes and writes a rule
This is the heart of the loop. Report a failure against the same decision:
curl -X POST http://localhost:8000/api/v1/evolve/report \
-H "X-API-Key: standalone" -H "Content-Type: application/json" \
-d '{"tenant_id":"default","agent_id":"backend-dev","fleet_id":"dev-fleet",
"scope":"fleet","outcome_type":"failure","related_ids":["<decision-id>"],
"outcome":"Shipped the 100 rpm sliding-window limiter to prod; it caused false throttling for legitimate batch API clients during nightly jobs and we rolled it back. The per-key window is too blunt for bursty batch traffic."}'Two things happen:
{
"outcome_type": "failure",
"weight_adjustments": [
{ "memory_id": "...", "old_weight": 1.0, "new_weight": 0.85, "delta": -0.15 }
],
"rules_generated": [
{ "rule_memory_id": "...", "confidence": 0.78,
"condition": "IF a per-key rate limiter is a fixed-size sliding-window applied to bursty traffic (nightly batch jobs, backfills) where legitimate clients exceed the average rate in short intervals",
"action": "THEN use a burst-tolerant strategy (token-bucket / leaky-bucket, or a sustained-rate + burst-allowance tier); load-test with realistic batch patterns and roll out as a canary before full prod." } ]
}- The weight drops by −0.15 — note the asymmetry: failures (−0.15) cost more than successes earn (+0.1), so a memory that burns the fleet decays faster than one good outcome can prop it up. (Partial outcomes nudge +0.03; weights floor at 0.05 and cap at 1.0.)
- A preventive
rulememory is generated (above a confidence threshold) — a structuredIF … THEN …the fleet will now recall. It's a normalscope_teammemory of typerule, titled e.g. "Burst-tolerant rate limiting for bursty per-key traffic."
So the next time any agent recalls "rate limiting," it gets both the original decision (now down-weighted) and the rule that says don't do the blunt version for bursty traffic. The fleet learned from one failure, once.
Step 4 — Per-agent retrieval tuning
Different agents want different recall. A reviewer wants precision (few, high-confidence hits); a docs writer wants breadth (more results, more graph context). memclaw_tune lets each agent shape its own retrieval profile — top_k, min_similarity, fts_weight (semantic↔keyword), graph_max_hops, freshness decay, and recall-boost knobs.
# code-reviewer dials in precision
curl -X PATCH "http://localhost:8000/api/v1/agents/code-reviewer/tune?tenant_id=default" \
-H "X-API-Key: standalone" -H "X-Agent-ID: code-reviewer" -H "Content-Type: application/json" \
-d '{"min_similarity":0.55,"top_k":5}'
# → search_profile: {"top_k":5,"min_similarity":0.55}
# docs-writer dials in breadth
curl -X PATCH "http://localhost:8000/api/v1/agents/docs-writer/tune?tenant_id=default" \
-H "X-API-Key: standalone" -H "X-Agent-ID: docs-writer" -H "Content-Type: application/json" \
-d '{"graph_max_hops":2,"top_k":15}'
# → search_profile: {"top_k":15,"graph_max_hops":2}Each agent's profile is applied to its recalls, so search quality compounds per agent, per feedback signal. A few notes:
- An agent tunes its own profile —
X-Agent-IDidentifies the caller; tuning a peer is blocked (except for admin/operator keys). In standalone thestandalonekey is admin-equivalent, so that self-only guard is bypassed locally; it bites once each agent has its own credential behind the gateway. memclaw_tuneis also the agent-native MCP tool;GET /agents/{id}/tunereads the current profile.
Step 5 — Close the loop
Make reporting a habit, not an afterthought. One line in each agent's CLAUDE.md from Part 1:
- After acting on recalled memories, report the outcome with memclaw_evolve
(success / failure / partial) and the memory IDs that influenced the action.Now the flywheel turns on its own:
write → recall → act → report → better recall
Every interaction leaves the memory a little smarter than it found it. Successes float to the top; failures become guardrails; each agent's retrieval sharpens to its job. A store you only read and write is static. A store that learns from outcomes compounds — which is the whole reason to give a fleet shared memory in the first place.
Next — Part 5: Memory hygiene at scale: what keeps 20,000 accumulating memories from becoming a swamp — contradiction detection, supersession, and the crystallizer.
caura-memclaw · Apache 2.0 — the Karpathy Loop (evolve) and per-agent tuning (tune) are in the open-source engine. ⭐ Star on GitHub · Join Discord · memclaw.net
A fleet that shares is useful. A fleet that learns is an asset.
Governed memory: scopes, trust tiers & keystone policies
Who sees what, who can change the fleet's knowledge, and what every agent must obey — visibility scopes, trust tiers, and keystone policies.
Memory hygiene at scale: contradictions, supersession & the crystallizer
What keeps a fleet's shared memory from rotting at scale — automatic contradiction detection and supersession, an 8-state lifecycle, the crystallizer hygiene scan, and memclaw_insights.