7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient

Published in Preprint, 2025

We demonstrate that emerging reasoning with reinforcement learning is both effective and efficient, achieving strong results with just a 7B model and 8K examples.

Authors: Weihao Zeng, Yuzhen Huang, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He

Project PageTwitterGitHub

Recommended citation: Weihao Zeng*, Yuzhen Huang*, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He. (2025). "7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient." Preprint.
Download Paper