Selected Publications

SAE-V

SAE-V: Interpreting Multimodal Models for Enhanced Alignment

Hantao Lou*, Changye Li*, Jiaming Ji, Yaodong Yang
Arxiv, 2025
TL;DR: We introduce SAE-V, a mechanistic interpretability framework that extends the SAE paradigm to MLLMs to interpret multimodal models and multimodal alignment process. Based on SAE-V, we build advanced data filtering methods to enhance multimodal alignment.
Align Anything

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

Jiaming Ji*, Jiayi Zhou*, Hantao Lou*, Boyuan Chen*, Donghai Hong*, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, Mohan Wang, Josef Dai, Tianyi Qiu, Hua Xu, Dong Li, Weipeng Chen, Jun Song, Bo Zheng, Yaodong Yang
Arxiv, 2024
TL;DR: We introduce the Align Anything framework, an end-to-end framework using language feedback to enhance the data, training, and evaluation of all-modality models, with our GitHub repo gaining 2.6K+ stars and featuring comprehensive support for multi-modal alignment, including the first full-parameter DeepSeek-R1 671B fine-tuning, extensible architecture requiring minimal changes for new models, and extensive benchmarking across 30+ evaluation standards.
Stream Aligner

Stream Aligner: Efficient Sentence-Level Alignment via Distribution Induction

Hantao Lou, Jiaming Ji, Kaile Wang, Yaodong Yang
Poster, AI Alignment Track, The 39th Annual AAAI Conference on Artificial Intelligence, 2025
TL;DR: We introduce the Streaming Distribution Induce Aligner (Stream Aligner), a novel alignment paradigm that combines efficiency with enhanced performance in various tasks throughout the generation process.
Aligner

Aligner: Efficient Alignment by Learning to Correct

Jiaming Ji*, Boyuan Chen*, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Tianyi Qiu, Yaodong Yang
Oral Presentation, The 38th Annual Conference on Neural Information Processing Systems, 2024
TL;DR: We introduce Aligner, a novel and simple alignment paradigm that learns the correctional residuals between preferred and dispreferred answers using a small model.
AI Alignment Survey

AI Alignment: A Comprehensive Survey

Jiaming Ji*, Tianyi Qiu*, Boyuan Chen*, Borong Zhang*, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao
Arxiv, 2023
TL;DR: We provide a comprehensive survey of the AI alignment field, covering the latest research, techniques, and applications.