7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
Published in Preprint, 2025
We demonstrate that emerging reasoning with reinforcement learning is both effective and efficient, achieving strong results with just a 7B model and 8K examples.
Authors: Weihao Zeng, Yuzhen Huang, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He
| Project Page | GitHub |
Recommended citation: Weihao Zeng*, Yuzhen Huang*, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He. (2025). "7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient." Preprint.
Download Paper
