Passes Locally, Fails in CI: Coding Agents That Actually Learn
June 16, 2026 · Caura.AI
The smartest coding agent on your team has amnesia. It will review a pull request brilliantly, trace a gnarly bug to its root cause, then close the window and forget all of it. Next session it re-derives your conventions, re-asks the questions it already answered, and rediscovers the gotcha that cost it an hour last week. Per-session genius, zero institutional memory.
We’ve been closing that gap for our Claude Code agents. Here’s a real example from last week.
“Passes locally, fails in CI”
Every engineer’s least favorite sentence. A new integration-test suite was green on my machine and red in CI. The error-investigation agent picked it up and traced it to something genuinely non-obvious: CI runs with the LLM-enrichment provider set to none — no live model calls on the runner. That quietly gates off the inline enrichment step. And the failing tests were exercising a write path that needs the model’s signal at write time.
So the chain was: no provider → no enrichment → no signal → the feature fail-closed (took no action) → the test assertions, which expected an action, failed. Green locally (real provider configured), red in CI. Nothing was broken in the product — the test was depending on a code path CI switches off.
The agent fixed it properly: inject the model signal deterministically in the test so it no longer depends on the provider config, and pin the write mode on the request instead of inferring it from settings. Then it reproduced the failure locally under CI’s exact env (ENTITY_EXTRACTION_PROVIDER=none) to prove the fix before trusting it.
The part that actually matters
It wrote the lesson down: “In CI the enrichment provider is off; any test that needs the model signal must inject it provider-independently — and reproduce under the CI env before assuming a fix works.” The next time that signature appears, it’s an instant recall — not another hour of bisecting and another red build. Solve once, never re-solve.
How it works
The memory layer is memclaw. Recall → Act → Write the outcome — and it’s shared and cross-session (the investigator’s lesson is available to the reviewer, to tomorrow’s run, to a teammate’s agent) and it evolves (stale facts are superseded, not left to rot).
The shift
From agents that are smart in the moment to agents that get better the more they work with you. Same models, same tools — but every session starts from everything the last one learned. The PR-review agent runs the same idea from the other side: it opens each review by recalling your real conventions and the decisions behind the code, and reviews to them. We wrote that one up separately in The Code Reviewer That Remembers Why. And if you want the enforcement story underneath it — standards that hold even when an agent is asked to break them — that’s keystones.
How are you giving your coding agents long-term memory?
Try it
OSS: github.com/caura-ai/caura-memclaw. Apache 2.0, docker compose up, REST and MCP out of the box.
Managed: memclaw.net. Same engine, governance wired in, we run it.