Building Production AI

"Do the work, Don"

Building real AI keeps offering you exits.

The half-broken version feels good enough. The roadmap of pivots is always available. The temptation to declare victory on a demo and move on to the next thing is constant, and I think most companies, if they’re honest with themselves (which they often aren’t), are quietly taking that exit and calling it something else.

I’ve been at this for about a year now at LOST iN, building a stack of AI-based “systems” (not sure what else to call them) across the business: content, search, retrieval, personalization, internal tooling, a few things I’m probably still too close to to describe well. The thing I keep noticing is how seductive the off-ramps are. There’s always a story you can tell. There’s always a deck somewhere that frames the half-built thing as progress. The exits aren’t shameful. They’re well-lit.

The first AI thing I stood up worked maybe half the time, and I want to be honest that at the time I thought that was most of the way there. It wasn’t. It really wasn’t.

Roughly half the responses were wrong, or generically unhelpful (the LLM defaults are remarkably bland once you start looking for them), or quietly drifting away from the editorial voice that makes LOST iN useful in the first place. It worked just well enough to feel like a product, and badly enough that I knew it wasn’t one.

That’s the exit point. That’s where most things stop. Forty percent of the way there, with a story to tell about how AI is “in production.”

So you keep building and rebuilding. Its a prompt away.

Small Insights on a Big Subject

One of the things I had to unlearn early was the idea that the prompt was the product. You write a clean, careful instruction, you watch the model do something that looks impressive on a Tuesday, and there’s this very strong pull to think you’ve shipped something. You haven’t. You’ve shipped a demo on a Tuesday…maybe. 

Prompts aren’t products. They’re the first crease in the paper. You have to keep folding, and folding, and unfolding, and folding again, before the lines start to actually hold.

Related, and this is the part nobody warns you about clearly enough: it’s incredibly easy to go down rabbit holes. You can spend a full day, a productive-feeling day, optimizing some edge case that affects 2% of queries while the main flow is still subtly broken in a way you forgot to check. The model is happy to help you go down those rabbit holes. The model is, in fact, an excellent rabbit hole companion. You have to manage yourself almost as carefully as you manage the system, which is a thing I didn’t expect and still don’t always get right.

On the positive side, the ability to build measurement and analytics systems around each product/system is amazing. The easiest way to create an iterative learning loop.  

Once you have a baseline, you can have a mission.

A few things I tried across different builds that didn’t pan out, mostly listed here in case it saves anyone else the trouble.

Training the model on our own content seemed obvious. It turned out to be the wrong tool for the job.

Writing longer, more detailed instructions to the model performed worse than writing shorter instructions paired with better underlying architecture. (I find this one continues to surprise people I talk to about it.)

Sticking to a single preferred model meant leaving real performance on the table. Different kinds of questions, it turns out, genuinely want different kinds of models, and trying to make one model do everything is a form of stubbornness more than a strategy. I consistently bring up costs in my instructions and loaded up access to many models. An obvious standard practice at this point but still worth noting.

Today’s “Brilliance” Becomes Tomorrows “duh”

I don’t think every operator needs to become a senior engineer. But operating an AI system from an architect’s chair, watching others build it for you while you listen to podcasts, leaves you a step behind on every decision that matters. I notice the difference in conversations now between people who have actually shipped something and people who have only read about shipping. It’s not subtle.

There’s a half-broken version of every AI product everywhere right now. The version that actually works requires a kind of patient, often boring discipline that most companies haven’t really built the muscle for yet, and I’m not sure most leadership teams understand what that discipline looks like up close. They think it’s about strategy. It’s mostly about iteration.

That gap is the opportunity. It’s also the test.

Plenty of companies will look at it and take the exit. The ones that don’t are the ones who treat the work as a thing you can actually move through, if you’re willing to.

The exits are always available. 

Do the work, Don. I think it resonates (with some) because there’s a no bullshit stop feeling sorry yourself element and get off your ass vibe. A version of Nike’s “Just Do It.”

You can talk about it, frame it, optimize for it, raise money for it, but eventually you have to sit down and do the work and the work is the only thing that teaches you what the work actually is.

Years ago I named a side practice Facendo, which is Italian for “doing.” I don’t know if I fully understood at the time why that word landed for me. I think I do now. Reading, planning, theorizing, framing — none of it produces the kind of knowledge that comes from doing the thing. There are entire categories of insight that are only available through facendo, and I think building AI is going to be one of the great examples of that for the next several years. You cannot read your way into this. You have to build, ship, watch it break, fix it, watch it break differently, fix it again, and at some point you’ll look up and realize you understand something nobody who hasn’t done it can quite explain.


 If you’re building and want to compare notes, find me on LinkedIn.

More Insights

What are strategic operations and why do they matter?
The evolution of people management