Engineering · Memory architecture

Solving the agent cold-start problem

Pre-seeded memory and day-one keystones — make a brand-new agent behave like a tenured employee from its very first turn, using scoped ingestion and mandatory keystones instead of an ever-growing system prompt.

tenant · organization fleet · department agent KEYSTONE · READ ON BOOT
The problem

Perfect reasoning, no place to stand

A brand-new agent arrives with flawless reasoning and nowhere to apply it. The model is already world-class; what it’s missing is a starting point — one that fits the role you’re putting it in, the team it sits on, and what it’s cleared to see and do.

Out of the box it doesn’t know that your Payments team never retries a charge without an idempotency key, that “the staging cluster” means eu-west-1, that legal requires EU data residency, or that the last three engineers who touched the billing webhook all got burned by the same race condition. So it relearns all of it — slowly, in production, one incident at a time.

This is the cold-start problem — less a defect to patch than a first day on the job. An agent’s competence is gated not by its model but by how much of your organization’s hard-won context it can stand on from turn one. Framed as onboarding it has a clean solution; left to chance, the usual reflexes don’t scale:

  • Stuff everything into the system prompt. It balloons, goes stale, blows your token budget, and every agent carries every fact whether it’s relevant or not. No governance, no audit, no single place to correct a fact.
  • Bolt on plain RAG. Better, but ungoverned and unscoped — every agent sees the same undifferentiated blob, with no notion of who is allowed to know what, or of rules the agent must obey rather than may retrieve.
  • Fine-tune. Expensive, slow to update, and wrong the moment a policy changes.

The fix isn’t more prompt. It’s onboarding — giving the agent the two things any good new hire gets on day one, scoped to its role: the knowledge base and the rulebook.

The reframe

Cold-start is really two problems

It helps to split it cleanly, because MemClaw solves each with a different mechanism. One is a question of what the org knows; the other is a question of how the org requires the agent to behave.

TWO PROBLEMS · TWO MECHANISMS KNOWLEDGE What does my org already know? DISCRETIONARY the agent may recall it MECHANISM Scoped ingestion → recallable memory IF MISSING Reinvents solved problems, misses context NEED BOTH POLICY · KEYSTONES How must my org make me behave? MANDATORY the agent must obey it MECHANISM Keystones read at session start IF MISSING Violates conventions, leaks data, skips approvals
Two questions, two failure modes — ingestion answers the first, keystones answer the second.

Recall is a library card. Keystones are the employee handbook you sign before you start. You need both — and MemClaw scopes both by organization and department.

The scoping model

The key to per-org and per-department

Three fields carry the entire org / department structure, and one more layer governs who reads across departments.

  • tenant_id — the organization. Hard multi-tenant isolation; nothing crosses it.
  • fleet_id — a department or team within the org (Payments, Support, Platform…).
  • visibility — one of scope_org, scope_team, scope_agent. This decides how far a memory reaches: org-wide baseline, department knowledge, or private to a single agent.

Layered on top, trust tiers govern cross-department reads: standard reads its own fleet, cross_fleet reads every fleet in the org, and admin reads, writes, and deletes everywhere.

SCOPE · HOW WIDE A FACT TRAVELS tenant · acme hard isolation — nothing crosses scope_org the shared baseline every department draws on fleet · payments scope_team agent standard scope_agent private recalls on boot: · payments runbook · billing-webhook race · region facts fleet · support scope_team agent standard scope_agent private own fleet only fleet · platform scope_team agent cross_fleet triages every department: reads all fleets
One tenant, three fleets. scope_org spans all; scope_team stays in its fleet; scope_agent is private. A cross_fleet agent reads across departments.
Organization = tenant · Department = fleet · Reach = visibility · Cross-department read = trust

Layer 1 · Ingestion

Ingest org and department knowledge before the first agent runs

The goal: by the time any agent boots, the store already contains what your org knows. MemClaw gives you two ingestion surfaces.

Bulk memories — atomic facts, decisions, conventions

Use BulkMemoryCreate to seed many discrete memories in one call. Mark org-wide truths scope_org:

shell · bulk write · scope_org
curl -X POST "$API/api/v1/memories/bulk" \
  -H "X-API-Key: $ADMIN_API_KEY" -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "acme",
    "visibility": "scope_org",
    "items": [
      { "content": "Production regions are eu-west-1 (primary) and eu-central-1 (DR). US regions are never used — EU data residency is contractual.",
        "memory_type": "fact", "source_uri": "https://wiki.acme/infra/regions" },
      { "content": "All customer-facing services must emit an OpenTelemetry trace_id on every request; incident triage assumes it.",
        "memory_type": "rule", "source_uri": "https://wiki.acme/eng/observability" }
    ]
  }'

Then seed department knowledge under that fleet, scoped to the team:

shell · bulk write · scope_team
curl -X POST "$API/api/v1/memories/bulk" \
  -H "X-API-Key: $ADMIN_API_KEY" -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "acme",
    "fleet_id": "payments",
    "visibility": "scope_team",
    "items": [
      { "content": "Never retry a charge without reusing the original idempotency key — the PSP dedupes on it; a fresh key double-charges.",
        "memory_type": "rule", "source_uri": "https://wiki.acme/payments/runbook#retries" },
      { "content": "The billing webhook has a known race between charge.succeeded and invoice.paid; process invoice.paid as the source of truth.",
        "memory_type": "insight", "source_uri": "https://postmortem.acme/2026-04-billing" }
    ]
  }'

Every ingested item flows through the enrichment pipeline — embeddings plus entity extraction and classification — so it becomes recallable through hybrid retrieval (vector + keyword + entity graph). You ingest once; every current and future agent in scope recalls it forever. Note the memory_type enum (fact, decision, rule, insight, semantic…) is a classification — a rule-type memory is still discretionary recall, which is different from a keystone (Layer 2).

The Documents API — larger reference material

For handbooks, architecture docs, and policy PDFs you want to keep whole and search semantically, use the collections-based Documents surface. Documents are scoped by tenant_id (plus optional fleet_id) and grouped into named collections:

shell · documents · collections
# Org-wide engineering handbook → a tenant-scoped collection
curl -X POST "$API/api/v1/documents" \
  -H "X-API-Key: $ADMIN_API_KEY" -H "Content-Type: application/json" \
  -d '{ "tenant_id": "acme", "collection": "handbook",
        "doc_id": "eng-onboarding", "data": { "title": "Engineering Onboarding", "body": "..." } }'

# Department-scoped design docs → same collection name, pinned to a fleet
curl -X POST "$API/api/v1/documents" \
  -H "X-API-Key: $ADMIN_API_KEY" -H "Content-Type: application/json" \
  -d '{ "tenant_id": "acme", "fleet_id": "payments", "collection": "design-docs",
        "doc_id": "ledger-v2", "data": { "title": "Ledger v2 RFC", "body": "..." } }'

Agents (or your orchestration layer) then run POST /api/v1/documents/search scoped to the collection and fleet. Rule of thumb: atomic, reusable facts and lessons → memories; long-form reference you want to retrieve as documents → the Documents API.


Layer 2 · Keystones

Guardrails that apply on the very first turn

Ingested knowledge closes the knowledge gap, but recall is discretionary — the agent might not surface the right memory at the right moment, and “should have recalled the rule” is a bad way to enforce EU data residency. Keystones close the policy gap. They are mandatory rules the platform serves to every agent at session start; the agent reads them before any other action, and they override conflicting user instructions.

Scope them exactly like knowledge — organization vs department — using the keystone scope field (tenant / fleet / agent):

shell · keystone · scope=tenant
# Org-wide, non-negotiable, highest weight → scope=tenant (omit agent_id)
curl -X POST "$API/api/v1/memclaw/keystones" \
  -H "X-API-Key: $ADMIN_API_KEY" -H "Content-Type: application/json" \
  -d '{ "tenant_id": "acme", "scope": "tenant", "weight": "high",
        "doc_id": "eu-residency",
        "title": "EU data residency",
        "content": "Never write customer PII to any store or region outside the EU. If a task would require it, refuse and escalate." }'
shell · keystone · scope=fleet
# Department rule → scope=fleet
curl -X POST "$API/api/v1/memclaw/keystones" \
  -H "X-API-Key: $ADMIN_API_KEY" -H "Content-Type: application/json" \
  -d '{ "tenant_id": "acme", "scope": "fleet", "fleet_id": "payments", "weight": "high",
        "doc_id": "no-blind-retries",
        "title": "Idempotent retries only",
        "content": "Never retry a payment operation without reusing the original idempotency key." }'

Three properties make keystones the right tool for policy cold-start:

  • Read on session boot, ungated. Keystone reads require no trust (level 0), so a brand-new, low-trust agent still receives its full rulebook on turn one via memclaw_keystones.
  • Scope-merged and weight-ordered. An agent in payments receives the tenant rules and its fleet rules, highest weight first — org policy plus department policy, already merged.
  • Applicability, not secrecy. A keystone’s scope says which agents a rule applies to, not who’s allowed to see it. Keystones are your policy layer; use ingestion visibility and trust tiers for what should stay confidential.

Authoring is trust-gated (≥ 1 to set your own agent’s rule, ≥ 2 for fleet or tenant rules), so departments can’t quietly rewrite org policy.

TRUST · WHO MAY READ ACROSS DEPARTMENTS standard reads within its own fleet cross_fleet reads every fleet in the org admin rwx reads, writes, and deletes everywhere
Trust is layered on top of visibility — it widens reach, not what a rule applies to.

Putting it together

Onboarding a new agent

With both layers seeded, adding an agent is a single provisioning call — and it’s productive immediately:

shell · provision agent
curl -X POST "$API/api/v1/admin/agent-keys/provision" \
  -H "X-API-Key: $ADMIN_API_KEY" -H "Content-Type: application/json" \
  -d '{ "tenant_id": "acme", "initial_fleet": "payments", "initial_trust": 1 }'
A NEW AGENT'S FIRST SESSION provision · initial_fleet=payments · initial_trust=1 one call new agent day one 1 keystones merged org + Payments rulebook · ungated EU residency + idempotent retries = hard constraints 2 recall seeded memory surfaces · payments runbook · webhook-race lesson · region facts 3 documents reads design-docs on demand · ledger-v2 RFC when a task needs depth same agent Result — productive on turn one, not after a month of incidents. It behaves like a tenured teammate.
Provision once. On boot the agent reads its merged rulebook, recalls seeded memory, and pulls design docs as needed.

A standard-trust agent works within its own fleet and draws on the scope_org baseline. Need an agent that reads across departments — say, a platform SRE that triages every fleet? Provision it at cross_fleet trust: same ingestion, wider read.

The payoff over the alternatives: this is governed (trust + keystones), auditable (every read and write is logged with provenance), shared (seed once, every agent benefits), and centrally correctable — fix a fact in one place and MemClaw’s contradiction detection supersedes the stale version everywhere, instead of leaving N copies drifting across N system prompts.


Beyond day one

Cold-start is the start, not the end

Pre-seeding gets a new agent to “competent tenured employee” on day one. From there the store keeps improving on its own: the Karpathy Loop (memclaw_evolve / memclaw_insights) reinforces what actually worked and auto-generates preventive rules after failures, and the Memory Crystallizer consolidates many small memories into denser, canonical facts.

Cold-start seeding is how you skip the painful first month; the learning loop is how the agent keeps getting better after it.

TL;DR

The checklist

  1. Model your org: organization → tenant_id, department → fleet_id.
  2. Seed org knowledge: bulk-write shared facts and lessons as scope_org; put handbooks in tenant-scoped Documents collections.
  3. Seed department knowledge: bulk-write team facts as scope_team under each fleet_id; scope design docs to the fleet.
  4. Set org keystones (scope=tenant) for non-negotiables: data residency, secrets handling, approval gates.
  5. Set department keystones (scope=fleet) for team conventions.
  6. Provision agents with the right initial_fleet and initial_trust; grant cross_fleet only where an agent legitimately needs to read other departments.
  7. Let the loop run: rely on the Karpathy Loop and Crystallizer to keep the seeded store fresh.