Steer2Edit: From Activation Steering to Component-Level Editing [arXiv and code coming soon]
Chung-En Sun, Ge Yan, Zimo Wang, Tsui-Wei Weng.
Preprint 2026.
How to Make LLMs Safer? Detecting and Editing Key Heads in LLMs
Kuan-Lin Chu,
Chung-En Sun, Tsui-Wei Weng.
NeurIPS Lock-LLM Workshop 2025.
Concept Bottleneck Large Language Models [
code]
Chung-En Sun, Tuomas Oikarinen, Berk Ustun, Tsui-Wei Weng.
ICLR 2025.
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities [
code] [
featured @ Microsoft Research Blog]
Chung-En Sun, Xiaodong Liu, Weiwei Yang, Tsui-Wei Weng, Hao Cheng, Aidan San, Michel Galley, Jianfeng Gao.
NAACL 2025 Main Oral.
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay Kulkarni, Ge Yan,
Chung-En Sun, Tuomas Oikarinen, Tsui-Wei Weng.
CVPR 2025.
Crafting Large Language Models for Enhanced Interpretability
Chung-En Sun, Tuomas Oikarinen, Tsui-Wei Weng.
ICML MI Workshop 2024.
Fooling GPT with Adversarial In-Context Examples for Text Classification
Sudhanshu Ranjan,
Chung-En Sun, Linbo Liu, Tsui-Wei Weng.
NeurIPS R0-FoMo Workshop 2023.
Melody harmonization using orderless NADE, chord balancing, and blocked Gibbs sampling
Chung-En Sun, Yi-Wei Chen, Hung-Shin Lee, Yen-Hsing Chen, Hsin-Min Wang.
ICASSP 2021.
NTIRE 2020 Challenge on NonHomogeneous Dehazing
Codruta O. Ancuti, Cosmin Ancuti, ...,
Chung-En Sun, ..., Murari Mandal.
CVPR Workshop 2020.