The Reasoning Era Has a Reasoning Problem | Freedom Dumlao

Reasoning models are the new hotness. Google’s Deep Think just won gold at the International Math Olympiad. OpenAI’s o-series models break down complex problems step by step. Every lab is racing to ship models that “think before they answer.”

The capability is genuinely impressive. I’m not here to argue otherwise. But I am here to ask a question that nobody in the model release press cycle seems interested in: when should you actually use a reasoning model?

The 200ms Problem

Most enterprise AI tasks don’t need a model that thinks for 45 seconds. They need one that answers in 200 milliseconds with 99.5% accuracy on a narrow domain.

I’ve built systems across enough contexts to know that latency tolerance varies wildly depending on where the AI sits in the user’s workflow. At Amazon, Alexa had to respond in under a second or users would repeat themselves (and get frustrated). At Zipcar, when a member was standing in the rain trying to unlock a car, every extra second of processing felt like an eternity. At Vestmark, when an advisor is on a phone call with a client, they need information now.

A reasoning model that spends 30 seconds deliberating on a lookup-and-format task is the wrong tool. Full stop.

Now consider a different kind of task: analyzing a complex system for subtle interactions between dozens of variables with competing constraints. That’s a problem where deliberate reasoning pays off. The model needs to consider constraints, evaluate tradeoffs, and produce a recommendation that accounts for multiple interacting factors.

Same product. Same user. Completely different model requirements. The “reasoning era” discourse consistently ignores this distinction.

The Cost Nobody Talks About

Reasoning models are expensive. Not in compute dollars alone (though those are real), but in latency, user experience, and system complexity.

Every reasoning token is a token you’re paying for that the user never sees. Extended thinking modes can generate thousands of internal tokens before producing a single word of output. For problems that genuinely require deep analysis, that’s a great tradeoff. For problems that don’t, you’re burning money and making your users wait for no reason.

I’ve been intentional about this in the AI features I build. Different model configurations for different tasks. Fast, cheap, and reliable for routine operations. Deeper reasoning for complex analytical work. The routing layer that decides which path to take is, in some ways, more important than either model.

The Framework

Here’s how I think about whether a task deserves a reasoning model:

Does the task involve multiple interacting constraints? Optimizing a system with competing objectives and interdependent variables is a constraint satisfaction problem. Formatting a summary is not.

Is the cost of a wrong answer high enough to justify the latency? If a five-second delay saves you from a bad outcome, take the five seconds. If a five-second delay just makes your UI feel sluggish for a routine lookup, don’t.

Does the task require the model to discover something, or retrieve and format something? Discovery benefits from reasoning. Retrieval and formatting benefit from speed.

Can you validate the reasoning? One underappreciated advantage of reasoning models is that you can inspect the chain of thought. But this only matters if you can evaluate whether the reasoning is sound. In domains where you can (math, code, structured analysis), reasoning models are powerful. In domains where the reasoning is opaque and unverifiable, you’re adding cost and latency to get a more confident-sounding answer.

The Real Skill

The AI industry loves to frame progress as a capability story. “Our model can now reason!” Great. But the skill that separates production AI from demo AI is knowing which problems deserve deep thought and which ones need fast reflexes.

This isn’t a new idea. Good engineers have always known that the right tool for the job depends on the job. A CTO who deploys a reasoning model for every API call is making the same mistake as one who uses a distributed database for a lookup table. Technically impressive. Practically wasteful.

The reasoning era is real. The question is whether we’ll be reasonable about how we use it.