Building a Multi-Agent Fleet with MemClaw and Claude Code
Give three agents one shared, governed brain — in about 30 minutes, with zero custom code.
Every AI agent you run today is a brilliant amnesiac.
It debugs a gnarly authentication issue at 2 PM, and by 2:05 — new session, new context window — that hard-won knowledge is gone. Worse: if you run several agents, each one re-learns the same lessons in its own silo. Your reviewer agent doesn't know what your dev agent discovered yesterday. Your docs agent contradicts both of them.
The fix isn't a bigger context window. It's shared memory — and once more than one agent can write to it, you immediately need governance: who can read what, who can write what, and an audit trail when something goes wrong.
That's exactly what MemClaw does. It's an open-source (Apache 2.0), MCP-native memory layer for agent fleets: agents write plain text, MemClaw turns it into enriched, searchable, permissioned memory that improves with use. It's running in production at eToro (NASDAQ: ETOR) with 300+ agents, but the OSS engine spins up on your laptop with one Docker command.
In this tutorial we'll build a small but real three-agent development fleet:
backend-dev— writes code, records decisions and gotchascode-reviewer— reviews PRs, recalls past decisions before nitpickingdocs-writer— keeps documentation consistent with what the other two actually did
Our agent harness is Claude Code — Anthropic's terminal-based coding agent. I picked it because it's the most popular agentic surface right now, it speaks MCP natively, and MemClaw ships a one-line skill installer for it. No frameworks, no orchestration code, no SDKs. Just config.
Everything below also works with Cursor, Windsurf, Claude Desktop, or any MCP client — only the config file location changes.
The architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Claude Code │ │ Claude Code │ │ Claude Code │
│ backend-dev │ │ code-reviewer│ │ docs-writer │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ MCP (Streamable HTTP) │
└──────────────┬───────────┬──────────┘
▼
┌───────────────────┐
│ MemClaw │
│ (FastAPI + MCP) │
│ Postgres+pgvector │
│ Redis │
└───────────────────┘Each Claude Code instance is an independent agent with its own agent_id. They never talk to each other directly. They communicate through memory — one writes, the others recall. This is the pattern that scales: at eToro, the same loop runs across 300+ agent identifiers with 26,500+ memories and 23 ms p50 search.
Step 1 — Spin up MemClaw (5 minutes)
You need Docker, Git, and an OpenAI API key (for embeddings and enrichment — Gemini, Anthropic, and OpenRouter are also supported for the LLM side).
git clone https://github.com/caura-ai/caura-memclaw.git
cd caura-memclaw
cp .env.example .envEdit .env with the minimal setup:
EMBEDDING_PROVIDER=openai
ENTITY_EXTRACTION_PROVIDER=openai
USE_LLM_FOR_MEMORY_CREATION=true
OPENAI_API_KEY=sk-...
# Embedding model — optional; this is the default. Set it to override.
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# Single-tenant local mode — simplest auth path
IS_STANDALONE=true
OPENAI_EMBEDDING_MODELis optional — MemClaw defaults totext-embedding-3-smallwhen it's unset. Pin it to be explicit, or change it to use a different model. The embedding dimension is fixed per deployment, so a swap has to match it.
Then:
docker compose up -dThis pulls multi-arch images from ghcr.io and brings up Postgres with pgvector, Redis, and the MemClaw API. Verify:
curl http://localhost:8000/api/v1/health
# {"status":"ok","storage":"connected","redis":"connected","event_bus":"ok"}Smoke-test the memory pipeline before wiring up any agents:
curl -X POST http://localhost:8000/api/v1/memories \
-H "X-API-Key: standalone" \
-H "Content-Type: application/json" \
-d '{"tenant_id": "default", "agent_id": "quickstart", "content": "Our auth service uses JWT with 15-minute expiry."}'Every MemClaw write is authored by an
agent_id— that identity is what makes the fleet governance in Step 6 possible, so we pass one explicitly. In standalone mode you may omit it (MemClaw falls back to a reserved default identity); on a shared or gateway deploymentagent_idis required and an anonymous write is rejected.
Look at the response. You sent one raw sentence; MemClaw returned a memory with an LLM-inferred memory_type, title, summary, tags, status, and weight (importance). That single-pass enrichment — classify, extract entities, scan for PII, detect contradictions, embed — happens on every write. Your agents never have to structure anything.
Now search for it with completely different words:
curl -X POST http://localhost:8000/api/v1/search \
-H "X-API-Key: standalone" \
-H "Content-Type: application/json" \
-d '{"tenant_id": "default", "query": "authentication token lifetime"}'It comes back under an items array, each hit carrying its similarity,
title, agent_id, and visibility:
{ "items": [ {
"title": "Our auth service uses JWT with 15-minute expiry.",
"memory_type": "fact", "similarity": 0.47,
"agent_id": "quickstart", "visibility": "scope_team"
} ] }Hybrid search blends pgvector semantic similarity, keyword matching, and knowledge-graph expansion. Memory layer: done. (Prefer not to self-host? Sign up at memclaw.net, grab an mc_ API key, and skip this step — everything else is identical.)
Step 2 — Connect Claude Code via MCP (2 minutes)
Add MemClaw as an MCP server. Register it at user scope (-s user) so it's available in every directory — Step 5 runs Claude Code from three different folders, and the default local scope would register it only for the one you're standing in now:
claude mcp add --transport http -s user memclaw http://localhost:8000/mcp \
--header "X-API-Key: standalone"Prefer a config file? Commit a .mcp.json to a project root — Claude Code reads that, but not settings.json. (For a multi-directory fleet like ours, the -s user command above is simpler.)
{
"mcpServers": {
"memclaw": {
"url": "http://localhost:8000/mcp",
"headers": { "X-API-Key": "standalone" }
}
}
}Confirm with claude mcp list — you should see memclaw: ✓ Connected. Claude Code now auto-discovers all 12 MemClaw tools: memclaw_write, memclaw_recall, memclaw_list, memclaw_manage, memclaw_doc, memclaw_entity_get, memclaw_tune, memclaw_insights, memclaw_evolve, memclaw_stats, memclaw_keystones, and memclaw_keystones_set.
Step 3 — Install the skill (1 minute)
Tools tell an agent what it can call. A skill tells it when and how — the recall-before-acting habit, the write-after-learning habit, what makes a good memory versus noise. MemClaw ships its usage guide as a Claude Code skill, served by your own instance:
curl -s "http://localhost:8000/api/v1/install-skill" > /tmp/install-memclaw-skill.sh
less /tmp/install-memclaw-skill.sh # always inspect before running
bash /tmp/install-memclaw-skill.shVerify and restart Claude Code (skills load at startup):
ls -la ~/.claude/skills/memclaw/SKILL.mdThe skill is loaded on demand, not injected into every turn — it costs nothing until the agent actually reaches for memory. This step is the difference between an agent that can use memory and one that does.
Step 4 — Give each agent an identity
Here's the move that turns "Claude Code with memory" into "a fleet."
Every MemClaw tool accepts an agent_id — the caller's identity. Agents auto-register on first write and get a trust tier. Memory authorship, retrieval tuning, and governance all hang off this identity. So each of our three agents needs to consistently identify itself.
The simplest mechanism in Claude Code is the project-level CLAUDE.md memory file. Create three working directories (or git worktrees), one per agent, and drop an identity block into each.
~/fleet/backend-dev/CLAUDE.md:
# Agent identity
You are agent `backend-dev` in fleet `dev-fleet`.
On EVERY MemClaw tool call, pass agent_id="backend-dev" and fleet_id="dev-fleet".
# Memory discipline
- BEFORE starting any task: memclaw_recall for relevant context
(prior decisions, known gotchas, conventions).
- AFTER completing any task: memclaw_write what you learned —
decisions made, bugs found, anything a teammate would want to know.
- Write facts that age well. Skip transient noise.~/fleet/code-reviewer/CLAUDE.md:
# Agent identity
You are agent `code-reviewer` in fleet `dev-fleet`.
On EVERY MemClaw tool call, pass agent_id="code-reviewer" and fleet_id="dev-fleet".
# Memory discipline
- BEFORE reviewing: memclaw_recall the relevant architectural decisions
and conventions, so you review against what the team agreed — not
against your own taste.
- AFTER reviewing: memclaw_write recurring issues you flagged, so the
fleet stops repeating them.~/fleet/docs-writer/CLAUDE.md:
# Agent identity
You are agent `docs-writer` in fleet `dev-fleet`.
On EVERY MemClaw tool call, pass agent_id="docs-writer" and fleet_id="dev-fleet".
# Memory discipline
- BEFORE writing docs: memclaw_recall what backend-dev and code-reviewer
recorded about the feature. Docs describe what was BUILT, not what
was planned.
- AFTER writing: memclaw_write a pointer to what you documented.That's the entire orchestration layer. No message bus, no LangGraph, no shared scratchpad files. Identity plus a shared memory substrate.
Step 5 — Watch knowledge flow
Open three terminals, run claude in each agent's directory, and try the canonical loop.
Terminal 1 (backend-dev):
"Add rate limiting to the payments API. We decided on 100 req/min per key using a sliding window in Redis — record the decision and any gotchas you hit."
The agent does the work, then calls memclaw_write with something like "Payments API rate limiting: 100 req/min per API key, sliding-window algorithm in Redis. Gotcha: the Redis connection pool maxes at 10 — raise REDIS_POOL_SIZE before adding more middleware that touches Redis." MemClaw classifies it (a decision and a rule, probably), extracts entities (Redis, payments API), embeds it, and stamps it scope_team — visible to the whole fleet by default.
Terminal 2 (code-reviewer), later, reviewing an unrelated PR that touches Redis:
"Review this PR that adds a Redis-backed feature-flag cache."
Before opining, the agent calls memclaw_recall("Redis conventions and known issues") — and gets backend-dev's pool-size warning. The review comes back with: "Heads up — the Redis connection pool is capped at 10 and the rate limiter already consumes connections; bump REDIS_POOL_SIZE or this will starve under load."
The reviewer never saw that code being written. It never spoke to backend-dev. One agent discovered; another recalled. That's the moment the fleet stops being three isolated amnesiacs and starts compounding.
Terminal 3 (docs-writer):
"Document the payments API rate limiting."
It recalls both the original decision and the review note, and produces docs that match reality — including the operational caveat that exists nowhere in the code comments.
Step 6 — Governance: the part everyone skips until it bites
Shared memory without permissions is a liability. Three MemClaw mechanisms matter even at three agents:
Visibility scopes. Every memory is stamped at write time: scope_agent (private to the author), scope_team (fleet-wide — the default), or scope_org (cross-fleet). When backend-dev writes a half-baked hypothesis it isn't sure about yet, it can keep it scope_agent until confirmed. Cross-fleet recall is permissioned, not open.
Trust tiers. Agents carry a trust level (0–3) controlling cross-fleet reads, writes, and deletes. New agents auto-register at a baseline; you promote them deliberately:
curl -X PATCH http://localhost:8000/api/v1/agents/backend-dev/trust \
-H "X-API-Key: standalone" \
-H "Content-Type: application/json" \
-d '{"trust_level": 2}'A trust-1 agent can manage its own memories; cross-agent operations require trust 2+. A compromised or buggy low-trust agent simply can't corrupt or delete the fleet's shared knowledge.
The audit log. Every write, read, delete, and status transition is logged with tenant, agent, and scope context:
curl "http://localhost:8000/api/v1/audit-log" -H "X-API-Key: standalone"When someone asks "why does the docs agent believe X?" you have a provenance chain, not a shrug. PII is auto-detected at write time and flagged in each memory's metadata (contains_pii, pii_types), so you can review or filter sensitive writes.
None of this required configuration. It's built in, not bolted on — which is precisely the argument for using a governed memory layer instead of pointing three agents at a shared vector store and hoping.
Step 7 — Close the loop: a fleet that learns
So far the fleet shares. MemClaw's third pillar is that it improves:
Outcome reporting. After acting on recalled memories, an agent reports back via memclaw_evolve — success, failure, or partial, with the memory IDs that influenced the action. Successes reinforce memory weights; failures auto-generate preventive rule-type memories. (The project calls this the Karpathy Loop.) One fleet nuance: memclaw_evolve defaults to scope="agent", which only reinforces the caller's own memories. To reinforce a teammate's recalled memory, the reporting agent must pass scope="fleet" (with fleet_id) and hold trust 2+ (the same promotion from Step 6) — at the default scope="agent" the update is skipped as out-of-scope, and even at scope="fleet" a trust-1 agent can't move another agent's weight. Add one line to each CLAUDE.md: "After acting on recalled memories, report the outcome with memclaw_evolve — pass scope="fleet" when the memory came from another agent."
Per-agent retrieval tuning. memclaw_tune lets each agent adjust its own retrieval profile. Your code-reviewer might want precision (higher min_similarity, lower top_k); your docs-writer wants breadth (more graph hops, more results). Search quality compounds per agent, per feedback signal.
Hygiene runs automatically. The LLM crystallizer merges near-duplicate memories into canonical atomic facts with provenance. Contradiction detection (RDF triples + LLM analysis) supersedes stale facts when the team changes its mind. An 8-status lifecycle retires what's outdated. Run memclaw_insights with focus contradictions or stale periodically — findings persist as insight memories that future runs build on.
The result is a flywheel: write → recall → act → report → better recall. Every interaction makes the next one smarter.
Where to go from here
You now have a working three-agent fleet with shared, governed, self-improving memory — built entirely from config files. Scaling paths:
- More agents is just more
CLAUDE.mdidentities. The pattern is identical at 3 or 300. - Running an OpenClaw fleet? MemClaw installs as a gateway plugin with one command and auto-stamps
fleet_idon every write — no per-agent prompting needed. - Skip the infra with the managed platform at memclaw.net (free tier: 10K memories) — same MCP endpoint, governance dashboard included.
- Mixed fleets: anything that speaks MCP joins the same brain — Cursor, Windsurf, Claude Desktop, Codex, custom agents via REST.
- Production reference: read the eToro "Company Brain" case study — 300+ agents, 26,500+ memories, 1,372 shared skills, 23 ms p50 search.
The repo is at github.com/caura-ai/caura-memclaw (Apache 2.0). Star it, break it, file issues — and come argue about agent memory on Discord.
Your agents are already smart. Stop making them start from zero.
MemClaw is built by Caura.ai. All benchmarks and limits referenced are from the project README and memclaw.net as of June 2026.