About Me

I am a Ph.D. student at University of Texas at Austin (UT Austin). I am honored to be advised by Prof. Qiang Liu.
I will be graduating in 2026 and am actively seeking full-time positions in industry.
I am interested in developing fundamental yet computationally feasible algorithms for the basic learning, inference, and optimization problems that underpin the cutting-edge AI/ML/statistical technologies. These days, I am mostly drawn by the training efficiency of large-scale models.

If you share common interests, are interested in potential collaboration, or simply want to connect for a chat, feel free to contact me. I'm always open to conversation :)

Educations

Ph.D student in Computer Science (2022.8 - ~)
Advisor: Prof. Qiang Liu.

Highlights

Core contributor to DeMo, the optimizer foundation behind Nous Research's distributed-training stack, used to train the Hermes model series.
Set a world record on Slowrun, the leaderboard for language modeling in the fixed-data, unlimited-compute regime; my submission has held the best validation loss on the benchmark for almost a month.
Andrej Karpathy on using Cautious Weight Decay in nanochat: "Worked great out of the box on nanochat too, beat standard weight decay in a solid sweep."
AK
Andrej Karpathy@karpathy
Worked great out of the box on nanochat too, beat standard weight decay in a solid sweep.
Hugging Face timm adopted Cautious Optimizers. Ross Wightman: "One of the last minute papers I added support for that delayed this release was 'Cautious Optimizers'... Consider me impressed, this boost appears more consistent than some of the new optimizers."
RW
Ross Wightman@wightmanr
One of the last minute papers I added support for that delayed this release was 'Cautious Optimizers'... Consider me impressed, this boost appears more consistent than some of the new optimizers.
Contributing to google-deepmind/simply, DeepMind's minimal and scalable JAX research codebase for frontier LLM research.
google-deepmind/simply repository preview

Invited Talks & Lectures

  • Meta AI: "Communication Efficient Distributed Training with Distributed Lion" (August 2024). Host: Raghu Krishnamoorthi.

Selected Publications & Preprints

φ-Balancing for Mixture-of-Experts Training
Lizhang Chen, Jonathan Li, Qi Wang, Runlong Liao, Shuozhe Li, Chen Liang, Ni Lao, Qiang Liu
ICML 2026
Lizhang Chen, Jonathan Li, Kaizhao Liang, Baiyu Su, Cong Xie, Nuo Wang Pierse, Chen Liang, Ni Lao, Qiang Liu
ICLR 2026
Kaizhao Liang*, Lizhang Chen*, Bo Liu, Qiang Liu
ICLR 2026
Bowen Peng, Lizhang Chen, Baiyu Su, Jeffrey Quesnelle, Diederik P. Kingma, Qiang Liu
ICLR 2026
Lizhang Chen, Jonathan Li, Qiang Liu
TMLR; Oral, OPT@NeurIPS 2025
Lizhang Chen*, Bo Liu*, Kaizhao Liang*, Qiang Liu
Spotlight, ICLR 2024
Lizhang Chen*, Bo Liu*, Lemeng Wu*, Kaizhao Liang, Jiaxu Zhu, Chen Liang, Raghuraman Krishnamoorthi, Qiang Liu
NeurIPS 2024
Kaizhao Liang, Bo Liu, Lizhang Chen, Qiang Liu
NeurIPS 2024




All rights reserved & Last update on Apr, 2024

google-deepmind/simply repository