Coding AgentsCI/CDMemory

Passes Locally, Fails in CI: Coding Agents That Actually Learn

June 16, 2026 · Caura.AI

The smartest coding agent on your team has amnesia. It will review a pull request brilliantly, trace a gnarly bug to its root cause, then close the window and forget all of it. Next session it re-derives your conventions, re-asks the questions it already answered, and rediscovers the gotcha that cost it an hour last week. Per-session genius, zero institutional memory.

We’ve been closing that gap for our Claude Code agents. Here’s a real example from last week.

“Passes locally, fails in CI”

Every engineer’s least favorite sentence. A new integration-test suite was green on my machine and red in CI. The error-investigation agent picked it up and traced it to something genuinely non-obvious: CI runs with the LLM-enrichment provider set to none — no live model calls on the runner. That quietly gates off the inline enrichment step. And the failing tests were exercising a write path that needs the model’s signal at write time.

So the chain was: no provider → no enrichment → no signal → the feature fail-closed (took no action) → the test assertions, which expected an action, failed. Green locally (real provider configured), red in CI. Nothing was broken in the product — the test was depending on a code path CI switches off.

The agent fixed it properly: inject the model signal deterministically in the test so it no longer depends on the provider config, and pin the write mode on the request instead of inferring it from settings. Then it reproduced the failure locally under CI’s exact env (ENTITY_EXTRACTION_PROVIDER=none) to prove the fix before trusting it.

Flow: tests pass green locally with a real provider but fail red in CI where the provider is none; the root cause is that CI gates off the enrichment step so the feature fail-closes; the fix injects the model signal in the test and pins the write mode; the failure is reproduced under CI's environment; the lesson is written to memclaw so the next occurrence is an instant recall. — **Fig 1**The whole investigation, end to end: from the green-local/red-CI symptom to the root cause, a fix proved under CI’s own environment, and the lesson written back to shared memory.

The part that actually matters

It wrote the lesson down: “In CI the enrichment provider is off; any test that needs the model signal must inject it provider-independently — and reproduce under the CI env before assuming a fix works.” The next time that signature appears, it’s an instant recall — not another hour of bisecting and another red build. Solve once, never re-solve.

How it works

The memory layer is memclaw. Recall → Act → Write the outcome — and it’s shared and cross-session (the investigator’s lesson is available to the reviewer, to tomorrow’s run, to a teammate’s agent) and it evolves (stale facts are superseded, not left to rot).

The learning loop: each session the agent recalls rules and past lessons, acts by reviewing a PR or investigating a failure, and writes the outcome back to memclaw — a persistent shared memory where stale facts are superseded — so the next session starts sharper. — **Fig 2**The loop every agent runs: recall what’s known, act, write the outcome back to shared memory. Each pass starts sharper than the last.

The shift

From agents that are smart in the moment to agents that get better the more they work with you. Same models, same tools — but every session starts from everything the last one learned. The PR-review agent runs the same idea from the other side: it opens each review by recalling your real conventions and the decisions behind the code, and reviews to them. We wrote that one up separately in The Code Reviewer That Remembers Why. And if you want the enforcement story underneath it — standards that hold even when an agent is asked to break them — that’s keystones.

How are you giving your coding agents long-term memory?

Try it

OSS: github.com/caura-ai/caura-memclaw. Apache 2.0, docker compose up, REST and MCP out of the box.

Managed: memclaw.net. Same engine, governance wired in, we run it.

← Back to Blog