Accountability in Three Levels

How carefully do you actually need to review AI-generated code?

It's a question I get often. And the answer I usually give is almost always: every line. Review every line you commit. Understand why. Put your name on what you own.

But that's not the full answer. It's the answer for one specific level — and there are three.

Simon Willison's Honest Admission

In May 2026, Simon Willison — one of the most respected voices in AI-assisted development, the person who coined the distinction between vibe coding and agentic engineering — published a newsletter with an uncomfortable insight.

He no longer reviews every line. Not even for production code. He calls it normalization of deviance and describes it as "quite upsetting." Then he continues anyway.

I don't reflexively agree with the criticism. I think he's right — on his level.

The levels are the point.

Level 1 — Solo (>10×)

You're building a personal tool. An experiment. Something for yourself.

If it breaks, it affects you. You bear the consequence, you see the error, you fix it. Accountability is concentrated in one person and that person is you.

At this level, rational flexibility is exactly that — rational. The work loop is good practice, not a moral obligation. If you choose not to review every line of a personal project, that's your choice and your risk. Vibe coding can be entirely rational at solo level — you feel the consequences immediately.

Simon builds personal tools. iNaturalist apps built on his phone, Redis playgrounds, presentation utilities. That's solo level. His drift is rational in that context.

Natural drift: Toward vibe coding — and that's fine. Minimum policy: None required. The work loop is good practice, not an obligation.

Level 2 — Engagement (5–10×)

Now it's no longer your risk alone.

The client's system. The client's data. The client's business. Your name on the contract — and on the code.

Here the work loop stops being a choice. Not because a process mandates it, but because it's the only way to actually stand behind what you deliver. You can't say "it looked right to me" when a bug costs the client money. You have to be able to say "I understood every line I committed."

That's a professional commitment. And it's a clear boundary: at solo level you choose. On an engagement, you're opting out of accountability.

Natural drift: Solo habits creep in — "it looks right" replaces actual review. Minimum policy: The work loop is non-negotiable. Every commit requires understanding.

Level 3 — Team delivery (2–5×)

This is the hardest level — not because the standards are higher, but because accountability can disappear entirely.

"The company is accountable" sounds robust. It isn't. It's a paper construct if no individual person actually owns every line that gets committed.

In a team, accountability can diffuse to nothing. Everyone reviewed the code means no one reviewed the code. No single person felt they were the one who owned the outcome. No one knew exactly where the consequences would land if something went wrong.

The only thing that works in practice is a named developer committing, understanding, and owning every line. The commit is not administrative work. It's the only accountability checkpoint that actually exists in a team. Without it, accountability is a word without content.

This is not a technology problem. It's a methods problem. And it's exactly what happens when a company normalises solo behaviour in a team delivery context.

Natural drift: Accountability diffuses to no one — "everyone reviewed it" replaces ownership. Minimum policy: Named owner per commit + team agreement on AI policies (recording, data handling, code check-in standards, parallel context threads). These decisions need to be made together — they surface as urgent problems mid-delivery if left unresolved.

What Simon Willison Actually Shows

Simon is right about his own situation. His reflection is honest and well-articulated.

The problem is that his situation gets used as a template by organisations that don't share his conditions. A senior 25-year solo engineer building personal tools has a consequence structure that looks fundamentally different from a fifteen-person team delivering a business-critical system to a client.

Simon names the risk himself — normalization of deviance. Every time the model produces correct code without close review, the threshold for the next time drops. That's rational at level 1 where the consequence stays with you. It's dangerous at level 3 where the consequence is carried by someone else.

Mindtastic's methodology isn't built to give Simon worse tools. It's built to prevent level-1 behaviour — with its rational foundations — from becoming the norm in organisations operating at level 3.

Three different answers to one question. And the most important competence is knowing which level you're on.

See also: The work loop — six steps that make accountability concrete (series 23) and AI forces you to think harder (series 12).