benchmark.darvinyi.com
← Back to Benchmarks
reasoningNEWPending curation

ARC-AGI-2

Second-generation abstract visual reasoning benchmark testing symbolic interpretation, compositional rule application, and context-aware reasoning. Pure LLMs score 0%; humans solve every task in ≤2 attempts.

Year2025

Why our crawl picked it up

Notes the discovery agent wrote when proposing this benchmark.

Successor to ARC-AGI-1, explicitly hardened against brute-force search and LLM pattern matching. Maintained by ARC Prize. Frontier reasoning systems reach only single-digit percentages, making it the clearest signal of progress toward fluid general intelligence currently available.

Source

Primary source ↗

This entry was added by an automated crawl and hasn't been curated yet. Once it's reviewed and promoted into the bundled set, you'll see task anatomy, examples, scores, and richer context here.