Wave of AI Science Research Benchmarks and Workbenches Emerges

연구/벤치마크 | Wed Jul 01 2026 00:00:00 GMT+0000 (Coordinated Universal Time) | 8 sources

OpenAI's GeneBench-Pro, Anthropic's Claude Science, and ScarfBench extended AI evaluation and tooling across research and enterprise domains.

Analysis

[OpenAI] released GeneBench-Pro computational biology benchmark [1][2]

[Anthropic] launched Claude Science AI workbench for scientists [3][6]

[Anthropic] strengthened vertical product strategy based on workflows rather than new models [4]

[Anthropic] elevated Claude Science to flagship status alongside Claude Code and Cowork [5]

[IBM Research] released ScarfBench benchmark for enterprise Java framework migration [7]

[Hugging Face] integrated Every Eval Ever with Community Evals [8]

Sources