benchmark.darvinyi.com
← Back to Benchmarks
mathNEWPending curation

FrontierMath

Hundreds of novel, research-grade mathematics problems across number theory, real analysis, algebraic geometry, and category theory, authored by expert mathematicians. Current best AI solves <2%.

Year2024

Why our crawl picked it up

Notes the discovery agent wrote when proposing this benchmark.

Fills a critical gap left by saturated benchmarks (MATH, GSM8K). Problems are unpublished and use automated verification to prevent data contamination. Solving a typical problem requires hours-to-days of effort even for specialists, providing a long-lived ceiling for math evaluation.

Source

Primary source ↗

This entry was added by an automated crawl and hasn't been curated yet. Once it's reviewed and promoted into the bundled set, you'll see task anatomy, examples, scores, and richer context here.