Deployment - Sutro Handbook

Deployment pages cover the runtime choices that shape AI system cost, latency, reliability, and operational control.

Pages in This Section

Batch vs. Real-Time Inference: when to run analytical AI workloads as batch jobs instead of real-time APIs.
Model Selection: how to choose a model based on task fit, cost, latency, control, and operational constraints.

Static Evals vs. Judges Batch vs. Real Time Inference