multimodalNEWPending curation

Video-MME

Comprehensive multi-modal LLM video benchmark spanning 30 subfields and 11-second to 1-hour durations, with subtitle and audio-integrated questions. Accepted CVPR 2025; adopted by OpenAI as industry standard.

Year2024

Why our crawl picked it up

Notes the discovery agent wrote when proposing this benchmark.

First dedicated benchmark for video-understanding evaluation of MLLMs. Covers short, medium, and long-form video with optional subtitle/audio modalities. Widely adopted: GPT-4.1 used it as a primary measure of multimodal long-context ability. Addresses a gap left by image-only multimodal benchmarks.

Source

Primary source ↗

This entry was added by an automated crawl and hasn't been curated yet. Once it's reviewed and promoted into the bundled set, you'll see task anatomy, examples, scores, and richer context here.