AI4 days agoModel Performance Begins with Data: Researchers from Ai2 Release DataDecide—A Benchmark Suite to Understand Pretraining Data Impact Across 30K LLM Checkpoints
AIApril 2, 2025Open AI Releases PaperBench: A Challenging Benchmark for Assessing AI Agents’ Abilities to Replicate Cutting-Edge Machine Learning Research
AIMarch 5, 2025Researchers from FutureHouse and ScienceMachine Introduce BixBench: A Benchmark Designed to Evaluate AI Agents on Real-World Bioinformatics Task
AISeptember 17, 2024Unlocking AI Potential: Allen Institute Introduces SUPER Benchmark to Test LLMs in Research Experimentation!