AIMarch 13, 2025Alibaba Researchers Introduce R1-Omni: An Application of Reinforcement Learning with Verifiable Reward (RLVR) to an Omni-Multimodal Large Language Model
AIMarch 5, 2025Beyond Monte Carlo Tree Search: Unleashing Implicit Chess Strategies with Discrete Diffusion
AIFebruary 11, 2025Shanghai AI Lab Releases OREAL-7B and OREAL-32B: Advancing Mathematical Reasoning with Outcome Reward-Based Reinforcement Learning
AIJanuary 26, 2025Google DeepMind Introduces MONA: A Novel Machine Learning Framework to Mitigate Multi-Step Reward Hacking in Reinforcement Learning