About Me

I am Weihao Zeng, a PhD student supervised by Prof. Junxian He at the Hong Kong University of Science and Technology, Department of Computer Science and Engineering. starting in the fall of 2025.

Research Interests

My main focus is on the post-training aspect of LLMs, specifically including:

Benchmarking and improving models for long-horizon, realistic agentic tasks (Toolathlon)
Improving model reasoning capabilities using reinforcement learning (RL) / self-evolution techniques (SimpleRL, B-STaR)
Exploring efficient data engineering methods for post-training (Deita, Auto Evol-Instruct)

Feel free to email me for any form of academic cooperation: wzengak@connect.ust.hk

News

2026-01: Two papers have been accepted by ICLR 2026!
2025-10: We introduce Toolathlon, a benchmark for language agents offering diverse applications and tools, realistic environment setup, and reliable execution-based evaluation! Toolathlon Leaderboard
2025-03: We introduce SimpleRL-Zoo, a deep investigation of zero RL training across diverse model families and sizes! SimpleRL-Zoo Twitter
2025-01: Announce our latest effort on O/R-1 Style Model and Scalable Reinforcement Learning for LLM Reasoning! SimpleRL Twitter
2025-01: Our B-STaR has been accepted by ICLR 2025!
2024-09: Our Auto Evol-Instruct has been accepted by EMNLP 2024!
2024-01: Our Deita has been accepted by ICLR 2024!

Publications

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng*, Yuzhen Huang*, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He
COLM 2025 | Paper | GitHub
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
Junlong Li*, Wenshuo Zhao*, Jian Zhao*, Weihao Zeng*, Haoze Wu*, Xiaochen Wang, Rui Ge, Yuxuan Cao, Yuzhen Huang, Wei Liu, Junteng Liu, Zhaochen Su, Yiyang Guo, Fan Zhou, Lueyang Zhang, Juan Michelini, Xingyao Wang, Xiang Yue, Shuyan Zhou, Graham Neubig, Junxian He
ICLR 2026 | Paper | GitHub | Leaderboard
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
Weihao Zeng, Keqing He, Chuqiao Kuang, Xiaoguang Li, Junxian He
ICLR 2026 | Paper | GitHub
7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient
Weihao Zeng*, Yuzhen Huang*, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He
HKUST-NLP Blog | Project | GitHub
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Weihao Zeng*, Yuzhen Huang*, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He
ICLR 2025 | Paper
Automatic Instruction Evolving for Large Language Models
Weihao Zeng, Can Xu, Yingxiu Zhao, Jian-Guang Lou, Weizhu Chen
EMNLP 2024 | Paper
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Wei Liu*, Weihao Zeng*, Keqing He, Yong Jiang, Junxian He
ICLR 2024 | Paper

Invited Talks

2025-04: SimpleRL-Zoo and B-STaR @ Qingke Talk (Online)
2025-03: SimpleRL-Zoo @ Westlake University (Hangzhou, China)
2025-02: SimpleRL @ Huawei Noah’s Ark Lab (Online)
2025-02: SimpleRL @ TikTok (Online)
2025-02: SimpleRL @ Northwestern University (Evanston, IL, USA)

Competitions and Awards

National Scholarship in China (2019/2023)
2022-09: Achieved the 1st Award on SereTOD Challenge 2022 track 2, EMNLP 2022!
2021-08: Achieved the 4th Award on SMP 2021 Conversational AI Challenge!
2021-09: Achieved the 8th Place on CCIR 2021 Intelligent NLU Challenge!