Fabian G. Williams aka Fabs

Fabian G. Williams

Principal Product Manager, Microsoft Subscribe to my YouTube.

Vibe Coding with AI – Best Practices for Every Project

Learn how to use evaluation-first development, GitHub Copilot, Claude, and local LLMs to build faster, smarter AI-powered apps.

Fabian Williams

5-Minute Read

Plan Eval Build Test Flow

πŸš€ Introduction

This guide captures a modern, evaluation-first approach to building AI-powered projects using tools like GitHub Copilot, Claude, ChatGPT, and local LLMs like LLaMA and DeepSeek.

We’ll use AsyncPR β€” a real-world mobile app project I built and shipped β€” as the public example. Check out a YouTube short video I did here on the App https://go.fabswill.com/asyncpr-shortintro and feel free to test it out! This guide reflects how I work today: VS Code as my IDE, Copilot for code assist, and running private models on my MacBook when needed for security or speed.

AsyncPR App available in the App Store

Grab the app in the App Store for iOS here https://apps.apple.com/app/id6744700840

If you’re serious about modern software building, you’ll love this framework.


πŸ› οΈ Purpose and Setting Expectations

This is not a theory doc written after the fact β€” it’s a living guide captured while actively working.
Key details:

  • IDE: Visual Studio Code (VS Code)
  • Primary AI Tools: GitHub Copilot, ChatGPT, Claude Desktop
  • Local Models: LLaMA 3.3:70B, DeepSeek 70B (via Ollama)

Mindset shift to Eval first

AsyncPR is featured because my private projects at work can’t be shared publicly β€” but the principles, practices, and rigor are identical.


🧠 Mindset Shift: From Testing to Evaluations (Evals)

Old Way New Way
Test after building Eval before building
Check if code “works” Check if code “works the right way”
Hope AI outputs are fine Define “good” first, then build

βœ… Plan β†’ Eval β†’ Build β†’ Test


πŸ§ͺ Start with Evaluations (Evals), Not Just Tests

Before writing any production code, define Evals: - What does a good output look like? - What would success/failure look like? - How can a real user journey be simulated?

πŸ“‹ Examples from AsyncPR:

  • Receipt image β†’ extracted business name? βœ…
  • Business name β†’ valid business email? βœ…
  • Customer narrative β†’ clean, structured feedback JSON? βœ…

Think of Evals like mini contracts between you and your AI tools.

πŸ“‚ Save these in /evals/ folder for every project.


πŸ“‹ Project Planning Before Coding

This is the step that saves you the most pain:
Work with the LLM to write a PLAN.md:

  • Must-Have Features
  • Nice-to-Haves
  • Out-of-Scope (for now)

The AI cannot read your mind β€” writing this plan forces clarity upfront.


πŸ› οΈ Tool Setup and AI Strategy

Today, my stack looks like:

  • VS Code + GitHub Copilot for core coding
  • Windsurf or Cursor for AI copartnering coding
  • Claude Desktop/Code for architectural planning and debugging
  • Custom MCP Servers for API documentation lookup and internal data sources
  • Local Ollama Models (LLaMA, DeepSeek) for private projects

My main IDE of choice is VS Code with GitHub Copilot and given a choice I am coding in .NET C#, when I face less of a choice its still VS Code but its in Python or TypeScript VS Code + Github Copilot Coding

Ive also started to dabble in Cursor and Windsurf, which is a clone of VS Code, I do like Windsurf native and dare I say easier approach to built in MCP support in the tooling configuration over Claude and VS Code. Windsurf VS Cloned IDE with baked in MCP Server conigs and LLM

I think I am just fanboying with using Claude Desktop because MCP is from Anthropic and Claude is too which also makes the Docs easy to follow when I was getting up to speed. I use this to test my MCP Servers as well. Anthropic Claude IDE with MCP Tools configured

🎯 Mindset: Treat AI tools like a team of interns β€” powerful but needing precise guidance.


πŸŒ€ Iterative, Section-by-Section Development

Build one section at a time:

βœ… Implement small pieces
βœ… Validate against Evals
βœ… Only then commit to Git

Never let bad AI outputs pile up. Reset early and often.


πŸ—‚οΈ Version Control is Sacred

  • Work from clean Git branches.
  • git reset --hard if AI drifts too far off course.
  • Use GitHub Actions to validate key rules.

Trust me: you’ll thank yourself later.


πŸ“Š High-Level Testing Always

Testing via Insomnia

Once Evals pass, simulate real-world behavior:

  • I often use Insomnia or a basic Blazor SPA to hit real endpoints.
  • Validate the entire user journey, not just isolated function outputs.

Testing via Blazor SPA


πŸ“š Deep Documentation and Accessibility

Docs aren’t just for humans anymore β€” they’re for AIs too.

  • Save API specs, database schemas, business rules under /docs/
  • Build MCP Servers that live-ingest updated docs
  • Even scrape and save static Markdown from sites if needed for local models

πŸ› οΈ Refactor Relentlessly

When the tests pass, refactor: - Break apart monolith files - Create small, focused modules - Ask your LLMs to suggest refactors too

Small files = happier humans and happier AIs.


🧠 Choose the Right Model for the Job

Some models are better at: - Planning: Claude and DeepSeek - Autocompleting: GitHub Copilot - Domain Reasoning: Your own MCP Servers

Choose the Right Model for the Job

Experiment, experiment, experiment.


πŸ”„ Keep Iterating

Every few weeks: - Try new models on old logic. - Update your Evals. - Share learnings with yourself (and your future teammates).

Vibe coding is about building smarter and faster β€” both.


πŸ“‚ Example Project Structure



/my-project-name
β”‚
β”œβ”€β”€ .gitignore
β”œβ”€β”€ README.md
β”œβ”€β”€ PLAN.md                  # Detailed project plan with must-have, nice-to-have, out-of-scope
β”œβ”€β”€ LLM_INSTRUCTIONS.md      # Special instructions to AI agents
β”‚
β”œβ”€β”€ /src                     # Application code
β”‚   β”œβ”€β”€ /backend
β”‚   β”œβ”€β”€ /frontend
β”‚   └── /shared
β”‚
β”œβ”€β”€ /tests                   # High-level user journey tests (Post-Eval validation)
β”‚
β”œβ”€β”€ /evals                   # Manual or automated Evals
β”‚   β”œβ”€β”€ image-processing-eval.md
β”‚   β”œβ”€β”€ business-name-detection-eval.md
β”‚   β”œβ”€β”€ feedback-generation-eval.md
β”‚   └── README.md             # Explains your Evals philosophy
β”‚
β”œβ”€β”€ /docs                    # API docs, specs, architecture references
β”‚
β”œβ”€β”€ /scripts                 # Utility scripts (e.g., image resizing, data migration)
β”‚
β”œβ”€β”€ /mcp-servers             # (Optional) Local MCP server configurations
β”‚
└── LICENSE


πŸ’¬ Final Thoughts

AI-assisted development demands a new way of thinking.
Evaluation-first frameworks, thoughtful planning, and tight Git discipline make the difference between chaos and clarity.

AsyncPR is just the beginning β€” this approach scales to any project, any team, any goal.

🎯 What’s one Eval you’ll try first? Comment or message me β€” let’s level up together!


Chat with me

Engage with me Click
BlueSky @fabianwilliams
LinkedIn Fabian G. Williams

Recent Posts

Categories

About

Fabian G. Williams aka Fabs Site