Skip to main content
You shouldn’t be collecting annotations for them to collect dust in old spreadsheets. If created and used well, these will become the unstructured gold needed to monotonically improve an analytical AI system over time. A maximally useful expert annotation has the following schema:
FieldCaptureWhy it matters
InputThe exact input the model received.Lets you reproduce the case and understand what context the model had.
Model outputThe model’s response, ideally with a terse rationale justifying its response.Shows both the behavior and the apparent reasoning behind its behavior.
Expert correctionThe expert-corrected output, if a correction is necessary.Provides the target behavior the system should learn.
Expert rationaleWhy the correction is right, especially when the rationale differs from the model’s.Turns a single example into a reasoning artifact that can be later abstracted into a decision rule.
Inference metadataModel used, system prompt, sampling params, timestamp, and related runtime details.Keeps the annotation tied to the exact system behavior being reviewed.
Expert metadataLabeler identity, timestamp, and review context.Supports auditability and disagreement review.
That’s not so scary! But it is a lot of work to keep these records clean, versioned, and accessible. Sutro helps with this by acting as an annotation store that can be used directly to modify model/agent behavior.