- They typically operate over model outputs as input data
- Their primary purpose is to provide verifiable measurements to otherwise unverifiable model outputs
- They are specifically is trying to follow the judgement rubric of a human expert, therefore requiring autoregressive reasoning capabilities (can’t really be built as a traditional ML classifier)
- Judges may be composed of multiple classifiers (composing multi-dimensional rubrics), rather than single field outputs
Classifiers
Are judges classifiers?
Why LLM judges are a special case of classifiers and when they should be treated as a distinct analytical AI primitive.
We split out judges and classifiers into two different primitives. Why?
In many ways, a judge is just a classifier (typically multiclass), but it’s a special case.