← Back to BenchmarksPrimary source ↗
mathNEWPending curation
FrontierMath
Hundreds of novel, research-grade mathematics problems across number theory, real analysis, algebraic geometry, and category theory, authored by expert mathematicians. Current best AI solves <2%.
Year2024
Why our crawl picked it up
Notes the discovery agent wrote when proposing this benchmark.
Fills a critical gap left by saturated benchmarks (MATH, GSM8K). Problems are unpublished and use automated verification to prevent data contamination. Solving a typical problem requires hours-to-days of effort even for specialists, providing a long-lived ceiling for math evaluation.
Source
This entry was added by an automated crawl and hasn't been curated yet. Once it's reviewed and promoted into the bundled set, you'll see task anatomy, examples, scores, and richer context here.