AIApril 11, 2025Balancing Accuracy and Efficiency in Language Models: A Two-Phase RL Post-Training Approach for Concise Reasoning
AIMarch 13, 2025Alibaba Researchers Introduce R1-Omni: An Application of Reinforcement Learning with Verifiable Reward (RLVR) to an Omni-Multimodal Large Language Model
AIMarch 7, 2025Researchers from AMLab and CuspAI Introduced Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
AIMarch 5, 2025Beyond Monte Carlo Tree Search: Unleashing Implicit Chess Strategies with Discrete Diffusion
AIFebruary 11, 2025Shanghai AI Lab Releases OREAL-7B and OREAL-32B: Advancing Mathematical Reasoning with Outcome Reward-Based Reinforcement Learning