AIFebruary 11, 2025Shanghai AI Lab Releases OREAL-7B and OREAL-32B: Advancing Mathematical Reasoning with Outcome Reward-Based Reinforcement Learning
AIFebruary 10, 2025Tutorial to Fine-Tuning Mistral 7B with QLoRA Using Axolotl for Efficient LLM Training
AIFebruary 8, 2025IBM AI Releases Granite-Vision-3.1-2B: A Small Vision Language Model with Super Impressive Performance on Various Tasks
AIJanuary 26, 2025Google DeepMind Introduces MONA: A Novel Machine Learning Framework to Mitigate Multi-Step Reward Hacking in Reinforcement Learning