My Agents Left Me 116 Git Repos. So I Built a Source of Truth.
I let agents build software on my Mac for months, and one morning I could not tell one of them where my own project actually lived. This is the eight hours it took to fix that, and the legend that now keeps my agents honest.
A few weeks ago I asked an agent to pick up work on one of my own projects. It opened a fresh terminal, looked around, found nothing it recognized, and built a brand new copy of a project I already had. That was not the agent being careless. Nothing on the machine had ever told it where the real one lived. The only map of my own work was a memory in my head, and a memory does not survive contact with a machine where agents do the building.
This morning I sat down at 8 to fix that. It is almost 4 now. This is the story of the eight hours in between, and the thing we built so it never happens again.
A word on why this is worth eight hours. My work changed shape this year. I used to clone a repo, scaffold a project, and write the code myself. Now I open a terminal and an agent does it; it branches, it builds, it opens a pull request. That shift is the entire point. It is also the trap. An agent in a fresh session has no memory of the last one, so it recreates what already exists, and it drops the new copy wherever the terminal happened to open, which is often inside my notes vault. Multiply that by months of agent-led work and you get sprawl. One of my projects had nine folders with its name on disk. I could no longer point at a project and say, with confidence, that is the one I ship from.
An agent that builds while you sleep needs a map it can read. Not a memory in your head.
So before either of us wrote a line of code, we wrote a contract.
We argued about the shape before we built it
I did not let the agent start building. We spent the first part of the morning being interrogated by a gate I run on every recurring automation: thirteen questions a loop has to answer before it earns the right to exist. What is the one number that should move. What outside fact proves the work happened. When does it stop. What is it allowed to touch without asking. Where does the truth live, and who owns dedupe. A deterministic validator checks the answers. If any field is hand-waved, the gate stays red and nothing ships.
That hour of being grilled is where the real design happened. We aligned on the topology, which sounds academic until you have nine copies of one project. The machine is the parent. GitHub is the witness, not the spine. The dedup key is the GitHub remote URL, because folder names lie and remote URLs do not. We agreed on the authority envelope: the loop acts on its own, then tells me what it did, because the whole point is autonomous agents that do not block on me. And we agreed on the one rule that makes that safe, which I will come back to.
We did this together, back and forth, for the better part of the day. I corrected the agent. The agent caught things I had not thought through. By the time the contract passed the gate, we had alignment, not just a spec. Only then did the building start.
The reckoning
The loop is called RepoReckoner. Its first pass is read-only on purpose: walk the machine, write down what is true, prove every claim against GitHub, and touch nothing. You learn the shape of the mess without risking a single byte.

116 local git repos on the machine. 165 repos on my GitHub account. The engine grouped the local ones into 113 logical projects and gave every single one a disposition. No repo got left in limbo without a label.

Now go back to the project with nine folders. My gut said: nine copies, clean up eight. The legend said something better and saved me from myself. Only two of those folders were real copies of the same repo. The rest were genuinely distinct projects that happened to share a name prefix, plus a few asset folders that were not git repos at all. If I had trusted my memory I would have deleted real work. The legend grouped by remote URL, looked at the evidence, and told me the truth instead of my assumption. That single correction is the whole reason this loop exists.
This is also where the disk cleanup loop I shipped earlier in the day fits in. I built two loops today, deliberately separate.
| Loop | Job | First pass does |
|---|---|---|
| The disk-health loop | Backups and reclaimable space | Inventory every folder, flag what is safe, never delete |
| RepoReckoner | The source-of-truth legend | Reconcile against GitHub, assign a disposition, never move |
One is a smoke detector for disk and backups. The other is the registry that says which copy is real. Different jobs, no shared dependency, so one can fail without taking the other down.
No delete without a witness
Here is the rule we agreed on in the morning, the one that lets the loop act while I am asleep. I do not gate it behind me clicking approve. I gate it behind a fact the agent cannot fake.

Every repo has a HEAD SHA, the fingerprint of its latest commit. Before the engine will ever call a repo safe, it runs git ls-remote and asks GitHub, live, whether that exact SHA exists on the remote. If it does, the repo is witnessed. If it does not, the repo holds work that exists nowhere but this one disk, so it can never be a delete candidate. It gets pushed to a backup first, or skipped and flagged for me.
That SHA goes straight into a receipt. This is what makes a faked result impossible. If a future agent, or a sloppy run, claims “this is the real copy, the rest are safe to remove,” I can re-query GitHub and check the SHA myself. So I tested exactly that this afternoon. I pulled a stored SHA out of a receipt and asked GitHub for the live tip of that repo. They matched, character for character. A receipt you cannot re-check is just a feeling. This one is a fact, and so is every other line in the legend.
This is the same discipline I keep coming back to, in your agent said it did the work and in paying down supervision debt. Trust is not assumed. It is measured, every run, against something outside the agent.
What the eight hours actually bought, including the parts that hurt
The point of a read-only first pass is to surface the mess honestly, not to hide it behind a green checkmark. So here is the uncomfortable half of the receipt.
20 repos have unpushed work. They hold commits that live on this disk and nowhere else. If that drive died tonight, that work is gone. I did not know that this morning. I know it now, by name, for every one of the twenty.
3 repos are undispositioned. Their remote claims to be mine, but the repo is not in my live GitHub list. Renamed, deleted, or made private, the machine cannot tell. Those 3 are flagged for me directly, and driving that number to zero is the number this loop exists to move. Not “the scan ran.” The count of my own repos I cannot account for. That is the difference between a loop that reports churn and a loop that reports an outcome.
The legend itself lives in one place, version controlled and synced across my machines, so any agent on any box reads the same truth before it builds. The raw scan is a disposable cache. The receipts are permanent. An adversarial second agent could audit every claim in them against GitHub and find no gap, because the proof is a SHA, not a sentence.
What I would tell you if you are headed here too
Write the contract before the code. The hour we spent being grilled by the gate is why the build went straight. Naming the outcome, the check, and the stop is the work; the code is just what is left over.
Pick a key that cannot lie. Folder names lie. The remote URL does not. Group by the externally verifiable thing and your “nine copies” collapse into the truth.
Verify against an outside fact, not the agent’s own word. A worker cannot grade its own work.
git ls-remoteis an oracle the agent does not control, which is exactly why you can trust it.Make the first pass read-only. You can learn the full shape of the mess without risking a byte. Mutation is a later phase, and it stays gated behind the witness rule.
Measure the number that should go down. Not activity. The count of things you cannot account for. If that number does not trend toward zero, the loop is theater, no matter how clean the logs look.
If you are running agents that build for you, this is coming for you too. The moment an agent can scaffold and commit, it will create state faster than you can hold in your head, and your memory will quietly stop being a reliable map. The fix is not to slow the agents down. It is to give them a map they can read and a receipt you can re-check.
You do not need a platform for any of this. The engine is a few hundred lines of standard-library Python, git, and the GitHub CLI. It runs on my own hardware, touches nothing it cannot prove, and leaves a paper trail anyone can audit. Local-first, witness-backed, no cloud bill. Eight hours, start to finish, a human and an agent working it out together.
The machine was never the problem. The missing map was. Now there is one, and the things that kept getting lost can finally read it.
If you are building autonomous agents and wrestling with where their work actually lives, I would like to hear how you are tracking it. I build agents and the tools to keep them honest.
Cheers, Fabian Williams
- Blog: fabswill.com
- LinkedIn: fabiangwilliams
- Twitter/X: @fabianwilliams
