Jun Zhu--TAN KAH KEE SCIENCE AWARD FOUNDATION

Information Sciences

Jun Zhu

Dr. Jun Zhu is a Bosch AI Professor at the Department of Computer Science and Technology in Tsinghua University and the Deputy Director of the Institute for AI, Tsinghua University. He is an IEEE Fellow and an AAAI Fellow. He was an Adjunct Faculty at the Machine Learning Department in Carnegie Mellon University (CMU) from 2015 to 2018. Dr. Zhu received his B.E and Ph.D. in Computer Science from Tsinghua in 2005 and 2009, respectively. Before joining Tsinghua in 2011, he did post-doctoral research in CMU. His research interest lies in machine learning theory, algorithms and applications. Dr. Zhu has published over 100 papers in the prestigious conferences and journals, with over 29k citations. He is an associate editor-in-chief for IEEE Trans. on PAMI and editorial board member for Artificial Intelligence. He served as senior area chair or paper award committee member at ICML, NeurIPS, ICLR, IJCAI and AAAI for over 20 times. He was a local co-chair of ICML 2014. He is a recipient of several awards, including ICLR Outstanding Paper Award, IEEE CoG Best Paper Award, XPlorer Prize, IEEE Intelligent Systems "AI's 10 to Watch" Award, MIT TR35 China, CCF Young Scientist Award, and CCF first-class Natural Science Award. His team has won several first-place awards in international competitions, including all the three tasks in NeurIPS 2017 adversarial attack and defense for deep learning and the intelligent decision task in ViZDoom 2018.

Efficient Inference and Large-scale Training of Multimodal Diffusion Models

Diffusion models provide theoretical foundations for many generative AI systems, including Sora, Stable Diffusion, etc. They perform superior over other deep generative models on generating multimodal (visual, auditory and so on) data. However, diffusion models are extremely slow on generating samples. Take image generation as an example, diffusion models usually need 50-100 steps to denoise an initial data sampled from standard Gaussian, in order to generate a clean image. Correspondingly, the overall time it takes is 50-100 times more than the other deep generative models. Such inefficiency seriously hinders the wide deployment of diffusion models.

This project has accomplished a series of original results on the basic theory, efficient algorithms, backbone networks, and distributed training of diffusion models. The main results include: (1) developed the Analytic-DPM algorithm which addressed the critical limitation of inaccurate estimate of the noise variance in the reverse diffusion process. We first provided a thorough theoretical analysis, which shows that the optimal solution of inverse variance and the KL-divergence both have analytical forms, and then further develop the training-free framework of Analytical-DPM improving sample efficiency by 20-80 times; (2) developed the DPM-Solver algorithm which is an efficient solver for diffusion ordinary differential equations (ODE) by leveraging their special structure information. DPM-Solver is training-free and improves the generation efficiency by over 2 times. It is the first algorithm that can generate high-quality samples with 10-15 steps; (3) developed the first architecture U-ViT which combine diffusion models with Transformer networks. U-ViT is earlier than the same principled DiT architecture by Sora team for 3 months. Moreover, we open-sourced the first large-scale pretrained diffusion model UniDiffuser based the diffusion transformer architecture, earlier than Stable Diffusion 3 for 1 year. The above results received the Outstanding Paper Award at ICLR 2022. The papers have over 2000 citations in last two years after publication. The algorithms have been adopted by the leading companies, including Huawei, OpenAI, Apple, and Stable Diffusion, in their text-to-image generation systems.