# The Analytical AI Handbook > A living FAQ to build, measure, optimize, and scale reliable decision models This is the agent-readable index for the Sutro Analytical AI Handbook. ## Full Guide - [Full handbook Markdown](/llms-full.txt) ## Pages - [The Analytical AI Handbook](/index): A living FAQ to build, measure, optimize, and scale reliable decision models - [Primitives](/primitives): Learn the fundamental building blocks of analytical AI systems. - [Classifiers](/primitives/classifiers): The most flexible and broadly applicable analytical AI primitive - [Types of Classifiers](/primitives/classifiers/types-of-classifiers): Common classifier patterns, including binary, multiclass, multilabel, hierarchical, open-set, and ordinal classifiers. - [Abstention](/primitives/classifiers/abstention): For when "I don't know" is the best decision a model can make - [Are judges classifiers?](/primitives/classifiers/are-judges-classifiers): Why LLM judges are a special case of classifiers and when they should be treated as a distinct analytical AI primitive. - [Extractors](/primitives/extractors): Pull relevant fields and spans from unstructured documents - [Extraction Verification](/primitives/extractors/extraction-verification): Ground-truthing extractors that contain free-form text - [Good Extractor Design](/primitives/extractors/good-extractor-design): Unfortunately, we''re not talking about Inception - [Judges](/primitives/judges): A core unit of analytical AI to scale the judgement of a domain expert. - [Judge Terminology](/primitives/judges/terminology): The core vocabulary used when discussing LLM judges and candidate models. - [What's in a Judge?](/primitives/judges/anatomy): The model, context, input, and output schema that make up a typical LLM judge. - [Types of Judges](/primitives/judges/types): Reliability, quality, sentiment, and intent judges in the judge-design hierarchy. - [Judges in Evals: Flip Your Intuition](/primitives/judges/intuition): First-principles responses to common objections about using LLMs to judge LLMs. - [Good Task Design Is All You Need](/primitives/judges/task-design): The design knobs that make LLM judges more reliable, measurable, and useful. - [Patterns](/patterns): Best-practices and battle-tested strategies for analytical AI. - [Consistency](/patterns/consistency): Boring as a feature - [Don't be fooled by determinism](/patterns/consistency/determinism): Why absolute determinism is less useful than measured consistency for real-world analytical AI systems. - [Task Specificity](/patterns/consistency/task-specificity): How specific task instructions, examples, and edge-case guidance make analytical AI systems more consistent. - [Fine-tuning and RL](/patterns/consistency/fine-tuning-and-rl): When to consider fine-tuning or reinforcement learning for analytical AI tasks, and why prompt optimization should usually come first. - [Parallel Sampling](/patterns/consistency/parallel-sampling): How parallel model samples and majority voting can improve consistency for repeated analytical AI decisions. - [Confidence Scores](/patterns/consistency/confidence-scores): How to use confidence signals, agreement checks, and escalation logic without relying on self-reported model confidence. - [Ensembles](/patterns/consistency/ensembles): How model ensembles can add useful perspectives, and why they are not always the simplest path to consistent AI behavior. - [Temperature](/patterns/consistency/temperature): How to tune model temperature with evals instead of assuming zero temperature is always best for consistency. - [Context](/patterns/context): What your model needs to know to get it right. - [Expert Annotations](/patterns/context/expert-annotation): Model behavior should be grounded in expert-reviewed data, and abstracted into generalized rulesets. - [Why Expert Annotations Matter](/patterns/context/expert-annotation/why-expert-annotations-matter): Model behavior should be grounded in expert-reviewed data, not guessed at from aggregate benchmarks. - [What Good Annotations Capture](/patterns/context/expert-annotation/what-good-annotations-capture): Useful expert annotations preserve the model input, model output, expert correction, rationale, and metadata needed to improve behavior over time. - [Which Cases to Annotate](/patterns/context/expert-annotation/which-cases-to-annotate): Annotation quality depends on choosing cases that expose ambiguity, edge behavior, and the expert judgment the model needs to learn. - [Evals](/patterns/evals): Patterns for measuring AI system behavior, reliability, and quality before and after release. - [Evals as Outer Loop](/patterns/evals/outer-loop): How evals fit into AI development and post-deployment monitoring. - [Eval Approaches](/patterns/evals/approaches): Common AI eval approaches and the role each one plays in measurement. - [Where to Start](/patterns/evals/where-to-start): How to choose a first eval that is narrow enough to build and useful enough to matter. - [Static Evals vs. Judges](/patterns/evals/static-evals-vs-judges): Why LLM judges make sense alongside static evals when evaluating AI systems over unbounded input spaces. - [Deployment](/deployment): Choices to make when your models are ready for action. - [Batch vs. Real-Time Inference](/deployment/batch-vs-real-time-inference): Faster, cheaper, better - [Model Selection](/deployment/model-selection): Selecting the right model for the task at hand. - [Open-Source vs. Closed Models](/deployment/model-selection/open-source-vs-closed): How to think about provider choice, ecosystem control, and model ownership for analytical AI systems. - [Performance Tradeoffs](/deployment/model-selection/performance-tradeoffs): How to balance intelligence, latency, throughput, cost, and reliability when choosing a model. - [Routers and Ensembles](/deployment/model-selection/routers-and-ensembles): When routing, escalation, majority voting, or multiple-model approaches are worth the complexity. - [Architectures](/architectures): Coming soon