Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification.
Anqi Zhang, Yulin Chen, Jane Pan, Chen Zhao, Aurojit Panda, Jinyang Li and He He. arXiv:2504.05419 preprint, 2025. [bib]

Transformers Struggle to Learn to Search.
Abulhair Saparov, Srushti Pawar, Shreyas Pimpalgaonkar, Nitish Joshi, Richard Yuanzhe Pang, Vishakh Padmakumar, Seyed Mehran Kazemi, Najoung Kim* and He He*. International Conference on Learning Representations (ICLR), 2025. [bib]

Language Models Learn to Mislead Humans via RLHF.
Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Sam Boman, He He and Shi Feng. International Conference on Learning Representations (ICLR), 2025. [bib]

Parallel Structures in Pre-training Data Yield In-Context Learning.
Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown and He He. Association for Computational Linguistics (ACL), 2024. [bib] [code]

Solving Olympiad Geometry without Human Demonstrations.
Trieu Trinh, Yuhuai Wu, Quoc V Le, He He and Thang Luong. Nature (Nature), 2024. [bib]

Personas as a Way to Model Truthfulness in Language Models.
Nitish Joshi*, Javier Rando*, Abulhair Saparov, Najoung Kim and He He. Empirical Methods in Natural Language Processing (EMNLP), 2024. [bib]

Does Writing with Language Models Reduce Content Diversity?.
Vishakh Padmakumar and He He. International Conference on Learning Representations (ICLR), 2024. [bib] [code]

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations .
Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu and Kathleen McKeown. International Conference on Machine Learning (ICML), 2024. Spotlight [bib] [code]

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples.
Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim* and He He*. Neural Information Processing Systems (NeurIPS), 2023. [bib] [code]

Reward Gaming in Conditional Text Generation.
Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P Parikh and He He. Association for Computational Linguistics (ACL), 2023. [bib] [talk]

Extrapolative Controlled Sequence Generation via Iterative Refinement .
Vishakh Padmakumar, Richard Yuanzhe Pang, He He and Ankur P Parikh. International Conference on Machine Learning (ICML), 2023. [bib] [code]

Language Models are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought.
Abulhair Saparov and He He. International Conference on Learning Representations (ICLR), 2023. [bib] [code] [talk]

Are All Spurious Features in Natural Language Alike? An Analysis through a Causal Lens.
Nitish Joshi, Xiang Pan and He He. Empirical Methods in Natural Language Processing (EMNLP), 2022. [bib] [code] [talk]

Meta-learning via Language Model In-context Tuning.
Yanda Chen, Ruiqi Zhong, Sheng Zha, George Karypis and He He. Association for Computational Linguistics (ACL), 2022. [bib] [code]

An Investigation of the (In)effectiveness of Counterfactually Augmented Data.
Nitish Joshi and He He. Association for Computational Linguistics (ACL), 2022. [bib] [code]

Text Generation by Learning from Demonstrations.
Richard Yuanzhe Pang and He He. International Conference on Learning Representations (ICLR), 2021. [bib] [code] [talk]

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models.
Lifu Tu, Garima Lalwani, Spandana Gella and He He. Transaction of Association for Computational Linguistics (TACL), 2020. [bib] [code]

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization.
Esin Durmus, He He and Mona Diab. Association for Computational Linguistics (ACL), 2020. [bib] [code] [talk]

Unlearn Dataset Bias for Natural Language Inference by Fitting the Residual.
He He, Sheng Zha and Haohan Wang. EMNLP Workshop on DeepLo, 2019. [bib] [code] [poster]

Pun Generation with Surprise.
He He*, Nanyun Peng* and Percy Liang. North American Chapter of the Association for Computational Linguistics (NAACL), 2019. [bib] [code] [codalab]

Decoupling Strategy and Generation in Negotiation Dialogues.
He He, Derek Chen, Anusha Balakrishnan and Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2018. [bib] [project]

QuAC: Question Answering in Context.
Eunsol Choi*, He He*, Mohit Iyyer*, Mark Yatskar*, Wen-tau Yih, Yejin Choi, Percy Liang and Luke Zettlemoyer. Empirical Methods in Natural Language Processing (EMNLP), 2018. [bib] [project]

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context.
Urvashi Khandelwal, He He, Peng Qi and Dan Jurafsky. Association for Computational Linguistics (ACL), 2018. [bib] [code]

Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer.
Juncen Li, Robin Jia, He He and Percy Liang. North American Chapter of the Association for Computational Linguistics (NAACL), 2018. [bib] [code]

Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings.
He He, Anusha Balakrishnan, Mihail Eric and Percy Liang. Association for Computational Linguistics (ACL), 2017. [bib] [project]

Opponent Modeling in Deep Reinforcement Learning.
He He, Jordan Boyd-Graber, Kevin Kwok and Hal Daume III. International Conference on Machine Learning (ICML), 2016. [bib] [code] [data]

Learning to Search in Branch and Bound Algorithms.
He He, Hal Daume III and Jason Eisner. Neural Information Processing Systems (NeurIPS), 2014. [bib] [code] [poster]

Dynamic Feature Selection for Dependency Parsing.
He He, Hal Daume III and Jason Eisner. Empirical Methods in Natural Language Processing (EMNLP), 2013. [bib] [talk]

Imitation Learning by Coaching.
He He, Hal Daume III and Jason Eisner. Neural Information Processing Systems (NeurIPS), 2012. [bib] [poster]

Single Image Super-resolution using Gaussian Process Regression.
He He and Wan-Chi Siu. Computer Vision and Pattern Recognition (CVPR), 2011. [bib] [code] [talk]