AIJune 21, 2025PoE-World + Planner Outperforms Reinforcement Learning RL Baselines in Montezuma’s Revenge with Minimal Demonstration Data
AIJune 12, 2025CURE: A Reinforcement Learning Framework for Co-Evolving Code and Unit Test Generation in LLMs
AIJune 11, 2025ether0: A 24B LLM Trained with Reinforcement Learning RL for Advanced Chemical Reasoning Tasks
AIJune 9, 2025High-Entropy Token Selection in Reinforcement Learning with Verifiable Rewards (RLVR) Improves Accuracy and Reduces Training Cost for LLMs
AIMay 29, 2025This AI Paper Introduces WEB-SHEPHERD: A Process Reward Model for Web Agents with 40K Dataset and 10× Cost Efficiency
AIMay 27, 2025Qwen Researchers Proposes QwenLong-L1: A Reinforcement Learning Framework for Long-Context Reasoning in Large Language Models