Too Many Approximations

You can only get so lucky

Published on 2025/09/30

Today, we finished the last chapter of "AI Engineering". It was an interesting journey that sparked many conversations among the team. I'll reflect on some of those later, and I have a longer post planned based on ideas from these discussions. Many conversations weren't purely technical, they questioned the ethical and philosophical implications of day-to-day AI use.

I led the discussion for the last chapter, which summarized system design for AI Engineering. It covered the evolution of AI systems, recommended components, and extensive material on user feedback and system improvement. Unfortunately, as I read through the book, I became increasingly skeptical about the quality of applications built on multiple AI agents.

I optimistically assume that as years pass, we'll have more effective and efficient models to use as individual blocks within AI application systems. For example, routers can be implemented using smaller models. Since many LLMs are probabilistic models (oversimplifying here), connecting all these components together reveals a problem: your data flow starts from a user query expressed in natural language (already an approximation of what the user wants). It then proceeds through several components like routers and gateways, which are also often models of different sizes. All of them are probabilistic, so not exactly accurate. As you put all the pieces together, you realize it's a very non-deterministic system, beyond what we've seen in regular system design.

Just last year I reviewed a software architectural book, and it's impressive how different a mature set of guidelines is compared to what I've seen in this book. What made it worse was the section on user feedback. That's where I lost most hope. There are countless ways to collect feedback, but again, it's based on unreliable and sometimes biased communication that you still have to sift through to extrapolate something your model or application can use to improve.

This still feels like a complex combination of components glued together with trial and error. I'm not sure if this impression is only because of how far we'd gotten when the book was written, or if things are now in a better place. And don't get me started on the ability to reproduce bugs or poor flows, that's already hard in established systems, I can't imagine in AI applications!

Thoughts

Building applications in the AI space seems complex at scale to be built reliably. I'm aware that simpler ones fundamentally based on educated prompts can work fairly well, but their level of complexity is much lower than the more involved systems we're used to seeing in largely successful products.

It was incredibly beneficial to read this book in a group setting. It was eye-opening to see how most people came to the same conclusions and had the same (perhaps too harsh) opinion about the book's delivery.

There still seem to be many unsolved problems to make AI applications predictable and reliable. While we're definitely in the early stages, it will be fascinating to see how the industry evolves in the next 5 to 10 years. I'd recommend exploring these issues before buying this book.

← Go Back