performance metrics Archives - Futurex Solutions – All Things Finance

Skip to content Skip to footer

Close

SWE-Bench Performance Reaches 50.8% Without Tool Use: A Case for Monolithic State-in-Context Agents

AIApril 30, 2025

Reinforcement Learning for Email Agents: OpenPipe’s ART·E Outperforms o3 in Accuracy, Latency, and Cost

AIApril 9, 2025

OpenAI Introduces the Evals API: Streamlined Model Evaluation for Developers

AIApril 2, 2025

Open AI Releases PaperBench: A Challenging Benchmark for Assessing AI Agents’ Abilities to Replicate Cutting-Edge Machine Learning Research

AIMarch 27, 2025

This AI Paper Introduces the Kolmogorov-Test: A Compression-as-Intelligence Benchmark for Evaluating Code-Generating Language Models

AIMarch 14, 2025

Optimizing Test-Time Compute for LLMs: A Meta-Reinforcement Learning Approach with Cumulative Regret Minimization