Publications

You can also find my articles on my Google Scholar profile.

Journal Articles

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Published in Preprint, 2025

A deep investigation of zero RL training across diverse model families and sizes.

Recommended citation: Weihao Zeng*, Yuzhen Huang*, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He. (2025). "SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild." Preprint.
Download Paper

7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient

Published in Preprint, 2025

Demonstrating that emerging reasoning with reinforcement learning is both effective and efficient using a 7B model and 8K examples.

Recommended citation: Weihao Zeng*, Yuzhen Huang*, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He. (2025). "7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient." Preprint.
Download Paper

Conference Papers

B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Published in ICLR 2025, 2025

A method for monitoring and balancing exploration and exploitation in self-taught reasoners.

Recommended citation: Weihao Zeng*, Yuzhen Huang*, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He. (2025). "B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners." ICLR 2025.
Download Paper

Automatic Instruction Evolving for Large Language Models

Published in EMNLP 2024, 2024

A method for automatically evolving instructions for large language models.

Recommended citation: Weihao Zeng, Can Xu, Yingxiu Zhao, Jian-Guang Lou, Weizhu Chen. (2024). "Automatic Instruction Evolving for Large Language Models." EMNLP 2024.
Download Paper

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

Published in ICLR 2024, 2024

A comprehensive study of automatic data selection in instruction tuning, introducing the Deita framework.

Recommended citation: Wei Liu*, Weihao Zeng*, Keqing He, Yong Jiang, Junxian He. (2024). "What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning." ICLR 2024.
Download Paper

Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation

Published in ACL 2023 Main Conference, 2023

Exploring compositional generalization of multi-attribute controllable dialogue generation.

Recommended citation: Weihao Zeng, Lulu Zhao, Keqing He, Ruotong Geng, Jingang Wang, Wei Wu, Weiran Xu. (2023). "Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation." ACL 2023 Main Conference.
Download Paper

FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue

Published in ACL 2023 Main Conference, 2023

A method for teaching future knowledge to pre-trained language models for task-oriented dialogue.

Recommended citation: Weihao Zeng, Keqing He, Yejie Wang, Chen Zeng, Jingang Wang, Yunsen Xian, Weiran Xu. (2023). "FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue." ACL 2023 Main Conference.
Download Paper