Use Case

STEM Reasoning

PhD-level reasoning requires proof, not patterns.

Common failures

Correct-looking derivations with a single wrong step that invalidates the conclusion
Plausible answers that confuse related concepts, close enough to fool non-experts
Notation and convention errors that domain experts catch in seconds

PhD-holder review of each reasoning step, not just the final answer

Classify errors: conceptual misunderstanding, procedural mistake, or notation error

Map which domains and difficulty levels produce the most silent failures

Step-by-step verified solutions from domain PhDs in bio, chem, math, med, physics, stats, finance

Each step annotated with the reasoning principle it applies, not just the calculation

Hard cases specifically targeting the error patterns diagnosis uncovered

Per-domain breakdown of error types, with example traces and severity ranking

Step-by-step expert solutions with provenance, including who verified it and why each step holds

Problems designed to catch the specific reasoning errors your model makes