Publications - Hantao Lou

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

Hantao Lou*, Changye Li*, Jiaming Ji, Yaodong Yang

Poster, Forty-Second International Conference on Machine Learning, 2025

arXiv

TL;DR: We introduce SAE-V, a mechanistic interpretability framework that extends the SAE paradigm to MLLMs to interpret multimodal models and multimodal alignment process. Based on SAE-V, we build advanced data filtering methods to enhance multimodal alignment.

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

Jiaming Ji*, Jiayi Zhou*, Hantao Lou*, Boyuan Chen*, Donghai Hong*, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, Mohan Wang, Josef Dai, Tianyi Qiu, Hua Xu, Dong Li, Weipeng Chen, Jun Song, Bo Zheng, Yaodong Yang

Arxiv, 2024

arXiv Code Huggingface

TL;DR: We introduce the Align Anything framework, an end-to-end framework using language feedback to enhance the data, training, and evaluation of all-modality models, with our GitHub repo gaining 2.6K+ stars and featuring comprehensive support for multi-modal alignment, including the first full-parameter DeepSeek-R1 671B fine-tuning, extensible architecture requiring minimal changes for new models, and extensive benchmarking across 30+ evaluation standards.

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction

Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang

Poster, AI Alignment Track, The 39th Annual AAAI Conference on Artificial Intelligence, 2025

arXiv Code Huggingface

TL;DR: We introduce the Streaming Distribution Induce Aligner (Stream Aligner), a novel alignment paradigm that combines efficiency with enhanced performance in various tasks throughout the generation process.

Aligner: Efficient Alignment by Learning to Correct

Jiaming Ji*, Boyuan Chen*, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Tianyi Qiu, Yaodong Yang

Oral, The 38th Annual Conference on Neural Information Processing Systems, 2024

arXiv Code Huggingface Website

TL;DR: We introduce Aligner, a novel and simple alignment paradigm that learns the correctional residuals between preferred and dispreferred answers using a small model.

AI Alignment: A Comprehensive Survey

Jiaming Ji*, Tianyi Qiu*, Boyuan Chen*, Borong Zhang*, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

Arxiv, 2023

arXiv Website

TL;DR: We provide a comprehensive survey of the AI alignment field, covering the latest research, techniques, and applications.

Selected Publications

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction

Aligner: Efficient Alignment by Learning to Correct

AI Alignment: A Comprehensive Survey