AIFebruary 11, 2025Shanghai AI Lab Releases OREAL-7B and OREAL-32B: Advancing Mathematical Reasoning with Outcome Reward-Based Reinforcement Learning
AIJanuary 26, 2025Google DeepMind Introduces MONA: A Novel Machine Learning Framework to Mitigate Multi-Step Reward Hacking in Reinforcement Learning
AIJanuary 24, 2025Researchers at Stanford Propose a Unified Regression-based Machine Learning Framework for Sequence Models with Associative Memory