B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Published in ICLR 2025, 2025

We propose B-STAR, a method for monitoring and balancing exploration and exploitation in self-taught reasoners.

Authors: Weihao Zeng, Yuzhen Huang, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He

Paper

Recommended citation: Weihao Zeng*, Yuzhen Huang*, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He. (2025). "B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners." ICLR 2025.
Download Paper