B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Published in ICLR 2025, 2025
We propose B-STAR, a method for monitoring and balancing exploration and exploitation in self-taught reasoners.
Authors: Weihao Zeng, Yuzhen Huang, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He
Recommended citation: Weihao Zeng*, Yuzhen Huang*, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He. (2025). "B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners." ICLR 2025.
Download Paper
