AIMay 20, 2025Omni-R1: Advancing Audio Question Answering with Text-Driven Reinforcement Learning and Auto-Generated Data
AIMay 13, 2025RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement Learning
AIMay 2, 2025DeepSeek-AI Released DeepSeek-Prover-V2: An Open-Source Large Language Model Designed for Formal Theorem, Proving through Subgoal Decomposition and Reinforcement Learning
AIApril 30, 2025Reinforcement Learning for Email Agents: OpenPipe’s ART·E Outperforms o3 in Accuracy, Latency, and Cost
AIApril 16, 2025SQL-R1: A Reinforcement Learning-based NL2SQL Model that Outperforms Larger Systems in Complex Queries with Transparent and Accurate SQL Generation
AIApril 11, 2025Balancing Accuracy and Efficiency in Language Models: A Two-Phase RL Post-Training Approach for Concise Reasoning