Balance intelligence and performance for the task at hand
There is no one-size-fits-all guide today to recommend which model to choose, so we recommend basing it on a number of factors. If you’re building an analytical AI system, it typically implies a model that will run the same task, many times. You should be able to optimize that model’s performance against strong evals you’ve built to validate its overall sufficiency. As of this writing, we recommend using models that are at least ~30B total parameters with internal reasoning capabilities unless cost or latency needs prohibit this size. Underneath this size we’ve seen noticeable lapses on out-of-distribution tasks, or weaker inference efficiency (due to longer reasoning traces) which defeat most cost-optimization or latency gains. Above this size, there can often be diminishing returns on quality for well-defined tasks.What to optimize
Model choice should be evaluated against the production shape of the workload:- Accuracy on representative task cases
- Throughput and latency requirements
- Cost per successful task completion
- Operational control over batching, scaling, and retries
- Stability across task variants and out-of-distribution inputs