I keep running into the same thing. I'll finish a feature with AI, check it, everything runs, tests pass. The output is wrong.
Not wrong like a bug. Wrong like it understood 80% of what I meant and filled in the rest with reasonable assumptions that happened to be incorrect. The kind of wrong where you stare at it for a minute before you can even articulate what's off.
This has been happening for as long as I've been writing code with AI. And it's not an AI problem. It's a me problem. I'm worse at specifying things than I thought I was.
Where stuff actually goes sideways
When I write code myself, vague ideas are fine. I hold the intent in my head, adjust as I go, and the code bends to what I actually meant even if I never fully spelled it out. I'm on both sides of it, so the sloppiness stays invisible.
Hand that to an AI and the sloppiness stops being invisible. You said something vague, it built something concrete out of that vagueness, and now you're staring at working code that does the wrong thing.
So I started doing something that felt like overkill at first. Every time a feature comes back wrong, I don't say "fix this." I stop and figure out: did I describe the wrong thing, or did it build the right description incorrectly? Design problem or implementation problem.
Sounds obvious. Took me a while to actually do it consistently. My instinct was always to just point at the broken part and say "not that." Which works for a single fix but compounds into a mess over a few iterations, because you're patching without knowing which layer drifted.
Every decision point becomes a conversation, every conversation becomes an artifact
That diagnostic habit turned into something bigger. When I'm working through an issue now, I end up confirming every decision point with the AI before moving on. Not in a "please summarize our conversation" way. More like: here's what I think this function should do, here's the edge case I'm worried about, do you see the same thing?
Sometimes it does. Sometimes it pushes back with something I missed. Sometimes it confidently agrees and then writes code that contradicts what we just discussed. That last one is actually useful because now I know my description wasn't tight enough.
After a few rounds I usually make it spell everything back to me. Not for a summary. Just to see where it's quietly disagreeing with me. There's almost always something. Usually a thing I thought was obvious and didn't bother saying out loud.
And then every one of those conversations has to land somewhere concrete. The decision goes into the design doc, the behavior goes into code, the expected outcome goes into a test case. If any of those three is missing, the conversation didn't actually finish. I learned that the hard way, by having the same argument with the AI twice because nothing from the first round got written down properly.
Over time all of that starts to weave together. My thinking interleaved with the AI's reasoning, from design through implementation into tests. It ends up being this tight net that catches stuff I would have dropped, and also the thing that actually connects the AI to the project instead of it just being a code generator I talk at.
Nothing gets a free pass
At some point I started bringing a second AI into the process. Not just for tests. For everything.
Design, code, tests. Claude writes it, ChatGPT reviews it. Or the other way around. Sounds paranoid, keeps catching things. They have different blind spots. One will accept a pattern without questioning it, the other flags it immediately. I've had cases where the first AI agreed with a design decision that the second one poked a hole in within thirty seconds.
Same thing happens with code. Same thing happens with tests. I've seen one AI write a test that passed but was testing the wrong behavior, and the second one caught it because it interpreted the description differently.
For tests specifically, I have the AI write four layers: unit, integration, business logic, system. Then I keep pushing. What cases are we missing. What about this input. What if this dependency is down. The test suite keeps growing, and stuff I didn't think through keeps surfacing. Some of it was sloppy from the start, just never exposed because I never had to make it explicit.
Reconciling three things that don't agree
What I actually spend my time on now isn't writing code. It's pulling three things back into alignment.
The spec says one thing. The code does something close but not identical. The tests verify a slightly different interpretation. They all came from my intent, but they've drifted apart through the process of being made concrete.
So I figure out which one is actually right, update the other two, run through it again. A lot of the time the update creates a new mismatch somewhere else.
Some days this is genuinely more work than writing it myself. Honestly, a lot of days. When I write code by hand, vague thinking is free. I can hold contradictions in my head, resolve them as I type, never confront the fact that my mental model had holes in it. AI doesn't let you do that. The description has to be precise enough for something with no shared context to execute on. The tests have to verify what I actually meant, not what I said.
Not sure where this lands
I could frame this as "AI makes you a better thinker" and it would be partly true but mostly annoying. It's more like it charges you for mental sloppiness that used to be invisible. Whether that's worth it depends on the day.
Some features come out cleaner than anything I would have written alone because the process caught assumptions I'd been carrying around without questioning. Other times I spend an hour going back and forth and end up with the same code I would have typed in twenty minutes.
Could be that I'm still learning when to use this and when to just write the thing. Could also be that the overhead never fully goes away and you just get faster at the reconciling part. I don't know yet.
Find me on GitHub | Substack | StratCraft
United States
NORTH AMERICA
Related News
What Does "Building in Public" Actually Mean in 2026?
19h ago
The Agentic Headless Backend: What Vibe Coders Still Need After the UI Is Done
19h ago
Why Iβm Still Learning to Code Even With AI
21h ago
I gave Claude a persistent memory for $0/month using Cloudflare
1d ago
NYT: 'Meta's Embrace of AI Is Making Its Employees Miserable'
1d ago