Publications

[google scholar] [all] [representative]

To learn about my latest work, you may want to check google scholar. To know more about my research style, take a look at representative work.

2025

Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors.
Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell* and He He*. arXiv:2506.10949 preprint, 2025. [bib]

Unsupervised Elicitation of Language Models.
Jiaxin Wen, Zachary Ankner, Arushi Somani, Peter Hase, Samuel Marks, Jacob Goldman-Wetzler, Linda Petrini, Henry Sleight, Collin Burns, He He, Shi Feng, Ethan Perez and Jan Leike. arXiv:2506.10139 preprint, 2025. [bib]

Predicting Empirical AI Research Outcomes with Language Models.
Jiaxin Wen, Chenglei Si, Yueh-han Chen, He He and Shi Feng. arXiv:2506.00794 preprint, 2025. [bib]

Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models.
Vishakh Padmakumar, Chen Yueh-Han, Jane Pan, Valerie Chen and He He. arXiv:2504.09389 preprint, 2025. [bib]

Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification.
Anqi Zhang, Yulin Chen, Jane Pan, Chen Zhao, Aurojit Panda, Jinyang Li and He He. Conference on Language Models (COLM), 2025. [bib]

When Benchmarks Talk: Re-Evaluating Code LLMs with Interactive Feedback.
Jane Pan*, Ryan Shar*, Jacob Pfau, Ameet Talwalkar, He He and Valerie Chen. Findings of the Association for Computational Linguistics (ACL Findings), 2025. [bib]

Transformers Struggle to Learn to Search.
Abulhair Saparov, Srushti Pawar, Shreyas Pimpalgaonkar, Nitish Joshi, Richard Yuanzhe Pang, Vishakh Padmakumar, Seyed Mehran Kazemi, Najoung Kim* and He He*. International Conference on Learning Representations (ICLR), 2025. [bib]

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats.
Jiaxin Wen*, Vivek Hebbar*, Caleb Larson*, Aryan Bhatt, Ansh Radhakrishnan, Mrinank Sharma, Henry Sleight, Shi Feng, He He, Ethan Perez, Buck Shlegeris and Akbir Khan. International Conference on Learning Representations (ICLR), 2025. [bib]

Language Models Learn to Mislead Humans via RLHF.
Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Sam Boman, He He and Shi Feng. International Conference on Learning Representations (ICLR), 2025. [bib]

2024

Beyond the Binary: Capturing Diverse Preferences With Reward Regularization.
Vishakh Padmakumar*, Chuanyang Jin*, Hannah Rose Kirk* and He He. arXiv:2412.03822 preprint, 2024. [bib]

Spontaneous Reward Hacking in Iterative Self-Refinement.
Jane Pan, He He, Sam Bowman and Shi Feng. arXiv:2407.04549 preprint, 2024. [bib]

LLMs Are Prone to Fallacies in Causal Inference.
Nitish Joshi, Abu Saparov, Yixin Wang and He He. Empirical Methods in Natural Language Processing (EMNLP), 2024. [bib]

Iterative Reasoning Preference Optimization.
Richard Yuanzhe Pang, Weizhe Yuan, Kyunghyun Cho, He He, Sainbayar Sukhbaatar and Jason Weston. Neural Information Processing Systems (NeurIPS), 2024. [bib]

The {PRISM} Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models.
Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen and Scott A Hale. Neural Information Processing Systems (NeurIPS), 2024. Oral [bib]

Foundational Challenges in Assuring Alignment and Safety of Large Language Models.
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Sean O hEigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi and David Krueger. Transaction on Machine Learning Research (TMLR), 2024. [bib]

Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World.
Guande Wu, Chen Zhao, Claudio Silva and He He. Findings of the Association for Computational Linguistics (ACL Findings), 2024. [bib] [code]

Parallel Structures in Pre-training Data Yield In-Context Learning.
Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown and He He. Association for Computational Linguistics (ACL), 2024. [bib] [code]

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning.
Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He and Jianfeng Gao. arXiv:2401.13986 preprint, 2024. [bib] [code]

Solving Olympiad Geometry without Human Demonstrations.
Trieu Trinh, Yuhuai Wu, Quoc V Le, He He and Thang Luong. Nature (Nature), 2024. [bib]

Improving Multi-Hop Reasoning in LLMs by Learning from Rich Human Feedback.
Nitish Joshi, Koushik Kalyanaraman, Zhiting Hu, Kumar Chellapilla, He He and Li Erran Li. AAAI Workshop on Neuro-Symbolic Learning and Reasoning in the era of Large Language Models, 2024. [bib]

Show Your Work with Confidence: Confidence Bands for Tuning Curves.
Nicholas Lourie, Kyunghyun Cho and He He. North American Chapter of the Association for Computational Linguistics (NAACL), 2024. [bib] [code]

Personas as a Way to Model Truthfulness in Language Models.
Nitish Joshi*, Javier Rando*, Abulhair Saparov, Najoung Kim and He He. Empirical Methods in Natural Language Processing (EMNLP), 2024. [bib]

Does Writing with Language Models Reduce Content Diversity?.
Vishakh Padmakumar and He He. International Conference on Learning Representations (ICLR), 2024. [bib] [code]

Leveraging Implicit Feedback from Deployment Data in Dialogue.
Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He and Jason Weston. The European Chapter of the Association for Computational Linguistics (EACL), 2024. [bib]

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations .
Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu and Kathleen McKeown. International Conference on Machine Learning (ICML), 2024. Spotlight [bib] [code]

Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation.
Aahlad Puli, Nitish Joshi, He He and Rajesh Ranganath. Transaction on Machine Learning Research (TMLR), 2024. [bib]

2023

{ARGUS}: Visualization of {AI}-Assisted Task Guidance in {AR}.
Sonia Castelo, Joao Rulff, Erin McGowan, Bea Steers, Guande Wu, Shaoyu Chen, Iran Roman, Roque Lopez, Ethan Brewer, Chen Zhao, Jing Qian, Kyunghyun Cho, He He, Qi Sun, Huy Vo, Juan Bello, Michael Krone and Claudio Silva. IEEE Transactions on Visualization and Computer Graphics (IEEE Vis), 2023. [bib]

Pragmatic Radiology Report Generation.
Dang Nguyen, Chacha Chen, He He and Chenhao Tan. Machine Learning for Health (ML4H), 2023. [bib]

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples.
Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim* and He He*. Neural Information Processing Systems (NeurIPS), 2023. [bib] [code]

Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations.
Chenglei Si*, Dan Friedman*, Nitish Joshi, Shi Feng, Danqi Chen and He He. Association for Computational Linguistics (ACL), 2023. [bib] [code]

Efficient Shapley Values Estimation by Amortization for Text Classification.
Chenghao Yang, Fan Yin, He He, Kai-Wei Chang, Xiaofei Ma and Bing Xiang. Association for Computational Linguistics (ACL), 2023. [bib]

Reward Gaming in Conditional Text Generation.
Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P Parikh and He He. Association for Computational Linguistics (ACL), 2023. [bib] [talk]

Extrapolative Controlled Sequence Generation via Iterative Refinement .
Vishakh Padmakumar, Richard Yuanzhe Pang, He He and Ankur P Parikh. International Conference on Machine Learning (ICML), 2023. [bib] [code]

Robustification of Multilingual Language Models to Real-world Noise in Crosslingual Zero-shot Settings with Robust Contrastive Pretraining.
Asa Cooper Stickland*, Sailik Sengupta*, Jason Krone, He He and Saab Mansour. The European Chapter of the Association for Computational Linguistics (EACL), 2023. [bib] [code]

How do decoding algorithms distribute information in dialogue responses?.
Saranya Venkatraman, He He and David Reitter. Findings of the European Chapter of the Association for Computational Linguistics (EACL Findings), 2023. [bib]

Language Models are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought.
Abulhair Saparov and He He. International Conference on Learning Representations (ICLR), 2023. [bib] [code] [talk]

On the Relation between Sensitivity and Accuracy in In-context Learning.
Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown and He He. Findings of the Empirical Methods in Natural Language Processing (EMNLP Findings), 2023. [bib]

2022

Are All Spurious Features in Natural Language Alike? An Analysis through a Causal Lens.
Nitish Joshi, Xiang Pan and He He. Empirical Methods in Natural Language Processing (EMNLP), 2022. [bib] [code] [talk]

Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing.
Tuhin Chakrabarty, Vishakh Padmakumar and He He. Empirical Methods in Natural Language Processing (EMNLP), 2022. [bib] [code] [project]

Improving Faithfulness by Augmenting Negative Summaries from Fake Documents.
Tianshu Wang, Faisal Ladhak, Esin Durmus and He He. Empirical Methods in Natural Language Processing (EMNLP), 2022. [bib] [code]

SeqPATE: Differentially Private Text Generation via Knowledge Distillation.
Zhiliang Tian, Yingxiu Zhao, Ziyue Huang, Yu-Xiang Wang, Nevin Zhang and He He. Neural Information Processing Systems (NeurIPS), 2022. [bib]

Amortized Noisy Channel Neural Machine Translation.
Richard Yuanzhe Pang, He He and Kyunghyun Cho. International Natural Language Generation Conference (INLG), 2022. [bib]

{QuALITY}: Question Answering with Long Input Texts, Yes!.
Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He and Sam Bowman. North American Chapter of the Association for Computational Linguistics (NAACL), 2022. [bib] [code]

Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning.
Vishakh Padmakumar, Leonard Lausen, Miguel Ballesteros, Sheng Zha, He He and George Karypis. North American Chapter of the Association for Computational Linguistics (NAACL), 2022. [bib]

Machine-in-the-Loop Rewriting for Creative Image Captioning.
Vishakh Padmakumar and He He. North American Chapter of the Association for Computational Linguistics (NAACL), 2022. [bib] [code]

Meta-learning via Language Model In-context Tuning.
Yanda Chen, Ruiqi Zhong, Sheng Zha, George Karypis and He He. Association for Computational Linguistics (ACL), 2022. [bib] [code]

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization.
Faisal Ladhak, Esin Durmus, He He, Claire Cardie and Kathleen McKeown. Association for Computational Linguistics (ACL), 2022. [bib]

An Investigation of the (In)effectiveness of Counterfactually Augmented Data.
Nitish Joshi and He He. Association for Computational Linguistics (ACL), 2022. [bib] [code]

2021

{IRM} - When It Works and When It Doesn't: A Test Case of Natural Language Inference.
Yana Dranker, He He and Yonatan Belinkov. Neural Information Processing Systems (NeurIPS), 2021. [bib] [code]

Types of Out-of-Distribution Texts and How to Detect Them.
Udit Arora, William Huang and He He. Empirical Methods in Natural Language Processing (EMNLP), 2021. [bib] [code]

Unsupervised Extractive Summarization with Pointwise Mutual Information.
Vishakh Padmakumar and He He. The European Chapter of the Association for Computational Linguistics (EACL), 2021. [bib] [code]

Text Generation by Learning from Demonstrations.
Richard Yuanzhe Pang and He He. International Conference on Learning Representations (ICLR), 2021. [bib] [code] [talk]

2020

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models.
Lifu Tu, Garima Lalwani, Spandana Gella and He He. Transaction of Association for Computational Linguistics (TACL), 2020. [bib] [code]

FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization.
Esin Durmus, He He and Mona Diab. Association for Computational Linguistics (ACL), 2020. [bib] [code] [talk]

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing.
Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, Aston Zhang, Hang Zhang, Zhi Zhang, Zhongyue Zhang, Shuai Zheng and Yi Zhu. Journal of Machine Learning Research (JMLR), 2020. [bib] [project]

2019

Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual.
He He, Sheng Zha and Haohan Wang. EMNLP Workshop on DeepLo, 2019. [bib] [code] [poster]

Pun Generation with Surprise.
He He*, Nanyun Peng* and Percy Liang. North American Chapter of the Association for Computational Linguistics (NAACL), 2019. [bib] [code] [codalab]

Quizbowl: The Case for Incremental Question Answering.
Petro Rodriguez, Shi Feng, Mohit Iyyer, He He and Jordan Boyd-Graber. arXiv:1904.04792 preprint, 2019. [bib]

A Dynamic Strategy Coach for Effective Negotiation.
Yiheng Zhou, He He, Alan Black and Yulia Tsvetkov. Special Interest Group on Discource and Dialogue (SigDial), 2019. [bib] [code]

2018

Decoupling Strategy and Generation in Negotiation Dialogues.
He He, Derek Chen, Anusha Balakrishnan and Percy Liang. Empirical Methods in Natural Language Processing (EMNLP), 2018. [bib] [project]

QuAC: Question Answering in Context.
Eunsol Choi*, He He*, Mohit Iyyer*, Mark Yatskar*, Wen-tau Yih, Yejin Choi, Percy Liang and Luke Zettlemoyer. Empirical Methods in Natural Language Processing (EMNLP), 2018. [bib] [project]

Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context.
Urvashi Khandelwal, He He, Peng Qi and Dan Jurafsky. Association for Computational Linguistics (ACL), 2018. [bib] [code]

Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer.
Juncen Li, Robin Jia, He He and Percy Liang. North American Chapter of the Association for Computational Linguistics (NAACL), 2018. [bib] [code]

2017

Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings.
He He, Anusha Balakrishnan, Mihail Eric and Percy Liang. Association for Computational Linguistics (ACL), 2017. [bib] [project]

2016

Credit Assignment Compiler for Joint Prediction.
Kai-Wei Chang, He He, Hal Daume III, John Langford and St&eacutephane Ross. Neural Information Processing Systems (NeurIPS), 2016. [bib] [code]

Opponent Modeling in Deep Reinforcement Learning.
He He, Jordan Boyd-Graber, Kevin Kwok and Hal Daume III. International Conference on Machine Learning (ICML), 2016. [bib] [code] [data]

Interpretese vs. Translationese: The Uniqueness of Human Strategies in Simultaneous Interpretation.
He He, Jordan Boyd-Graber and Hal Daume III. North American Chapter of the Association for Computational Linguistics (NAACL), 2016. [bib] [code]

Object Detection in 20 Questions.
Xi Chen, He He and Larry Davis. Winter Conference on Applications of Computer Vision (WACV), 2016. [bib]

2015

Active Information Acquisition.
He He, Paul Mineiro and Nikos Karampatziakis. ICML Workshop on Machine Learning From and For Adaptive User Technologies: From Active Learning & Experimentation to Optimization & Personalization, 2015. [bib] [poster]

Interactive Incremental Question Answering.
Jordan Boyd-Graber, Mohit Iyyer, He He and Hal Daume III. Neural Information Processing Systems (NeurIPS) demo, 2015. Outstanding Demonstration Award

Syntax-based Rewriting for Simultaneous Machine Translation.
He He, Alvin Grissom II, John Morgan, Jordan Boyd-Graber and Hal Daume III. Empirical Methods in Natural Language Processing (EMNLP), 2015. [bib] [code] [talk]

Learning to Search for Dependencies.
Kai-Wei Chang, He He, Hal Daume III and John Langford. arXiv:1503.05615 preprint, 2015. [bib] [code]

Crowdsourcing with Multi-Dimensional Trust.
Xiangyang Liu, He He and John Baras. International Conference on Information Fusion (Fusion), 2015. [bib]

Trust-Aware Optimal Crowdsourcing With Budget Constraint.
Xiangyang Liu, He He and John Baras. International Conference on Communications (ICC), 2015. [bib]

2014

Temporal Supervised Learning for Inferring a Dialog Policy from Example Conversations.
Lihong Li, He He and Jason D. Williams. Spoken Lanugage Technology Workshop (SLT), 2014. [bib]

Learning to Search in Branch and Bound Algorithms.
He He, Hal Daume III and Jason Eisner. Neural Information Processing Systems (NeurIPS), 2014. [bib] [code] [poster]

Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation.
Alvin Grissom II, He He, John Morgan, Jordan Boyd-Graber and Hal Daume III. Empirical Methods in Natural Language Processing (EMNLP), 2014. [bib] [talk]

2013

Dynamic Feature Selection for Dependency Parsing.
He He, Hal Daume III and Jason Eisner. Empirical Methods in Natural Language Processing (EMNLP), 2013. [bib] [talk]

2012

Imitation Learning by Coaching.
He He, Hal Daume III and Jason Eisner. Neural Information Processing Systems (NeurIPS), 2012. [bib] [poster]

Besting the Quiz Master: Crowdsourcing Incremental Classification Games.
Jordan Boyd-Graber, Brianna Satinoff, He He and Hal Daume III. Empirical Methods in Natural Language Processing (EMNLP), 2012. [bib]

Cost-sensitive dynamic feature selection.
He He, Hal Daume III and Jason Eisner. ICML Workshop on Inferning, 2012. [bib] [poster] [talk]

2011

Single Image Super-resolution using Gaussian Process Regression.
He He and Wan-Chi Siu. Computer Vision and Pattern Recognition (CVPR), 2011. [bib] [code] [talk]

2010

Rare Class classification with SVM.
He He and Ali Ghodsi. International Conference on Pattern Recognition (ICPR), 2010. [bib] [code] [poster]

Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification.
Anqi Zhang, Yulin Chen, Jane Pan, Chen Zhao, Aurojit Panda, Jinyang Li and He He. arXiv:2504.05419 preprint, 2025. [bib]