← Back to BenchmarksPrimary source ↗
multimodalNEWPending curation
MMMU-Pro
Hardened extension of MMMU with 10-option MCQs, vision-only perceptual tasks, and multi-image questions. Models score 16–27%, vs. ~56% on the original MMMU.
Year2024
Why our crawl picked it up
Notes the discovery agent wrote when proposing this benchmark.
Addresses three flaws in MMMU: text-only solvability, small option spaces enabling guessing, and lack of pure-vision items. Adds a 'vision-only' track where models must answer from images alone with no text shortcuts. More realistic measure of true multi-discipline multimodal reasoning across 30+ academic subjects.
Source
This entry was added by an automated crawl and hasn't been curated yet. Once it's reviewed and promoted into the bundled set, you'll see task anatomy, examples, scores, and richer context here.