Fabian G. Williams aka Fabs

Fabian G. Williams

Principal Product Manager, Microsoft Subscribe to my YouTube.

How Do You Trust an Autonomous AI Agent? Evals Are the Answer.

I run an autonomous AI agent at home — 16 cron jobs daily. It says 'done' but did it actually do anything? I built an eval framework to find out. Here's what broke, what I learned, and why agent evals are fundamentally different from LLM evals.

Fabian Williams

10-Minute Read

OpenClaw Eval Dashboard showing mixed results across 9 dimensions — the honest picture after adding freshness, failure rate, and delivery gap scoring

I run an autonomous AI agent on a Mac Mini in my house. She handles 16 daily cron jobs — finances, email triage, outreach campaigns, device monitoring, morning briefings. The agent says “done.” But did it actually do anything? I built a 9-dimension eval rubric to find out. Along the way I discovered that my evals were broken, my agent was better than I thought, and the most important metric isn’t pass/fail — it’s whether a failure is your fault or the agent’s fault.

Your Next Hire Should Be an AI — Here's How a Nonprofit Did It in Two Weeks

How MACONA went from a one-person operation to a team of two — without adding headcount. An autonomous AI executive assistant managing email, social media, newsletters, and donor outreach 24/7 on dedicated hardware.

Fabian Williams

6-Minute Read

OpenClaw Gateway Dashboard showing healthy status, 12 active sessions, and cron jobs enabled

We deployed an autonomous AI executive assistant for a nonprofit in under two weeks. She runs eight scheduled programs daily — morning briefings, social media, donor research, newsletter drafts, content scouting, and end-of-day digests — all without being asked. The CEO went from drowning in operational work to just making decisions. The same pattern works for any small organization: medical practices, restaurants, law firms, conferences, mom-and-pop shops.

Qui Non Proficit Deficit: Three Months Offline, Two Apps Shipped, and an AI That Runs a Nonprofit

I disappeared from LinkedIn and YouTube for three months. Not burned out — building. Two iOS apps shipped, an autonomous AI assistant running a nonprofit, and a workflow that changed everything. Here's the full story.

Fabian Williams

12-Minute Read

Stale Contacts Cleaner on the App Store

I went heads down for about three months — no LinkedIn, no YouTube, barely any Twitter. In that time I shipped two iOS apps to the App Store, built an autonomous AI assistant that runs a nonprofit’s entire digital presence 247, and developed a workflow where AI agents scale my output 3-5x. This post is the full story: the career pattern that taught me to recognize seismic shifts, what I actually built, and why I’m back.

Recent Posts

Categories

About

Fabian G. Williams aka Fabs Site