I am a Ph.D. student at University of Texas at Austin (UT Austin). I am honored to be advised by Prof. Qiang Liu.
I will be graduating in 2026 and am actively seeking full-time positions in industry.
I am interested in developing fundamental yet computationally feasible algorithms for the basic learning, inference,
and optimization problems that underpin the cutting-edge AI/ML/statistical technologies.
These days, I am mostly drawn by the training efficiency of large-scale models.
If you share common interests, are interested in potential collaboration, or simply want to connect for a chat, feel free to contact me. I'm always open to conversation :)
Core contributor to DeMo, the optimizer foundation behind Nous Research's distributed-training stack, used to train the Hermes model series.
Set a world record on Slowrun, the leaderboard for language modeling in the fixed-data, unlimited-compute regime; my submission has held the best validation loss on the benchmark for almost a month.
Andrej Karpathy on using Cautious Weight Decay in nanochat:
"Worked great out of the box on nanochat too, beat standard weight decay in a solid sweep."
AK
Andrej Karpathy@karpathy
Worked great out of the box on nanochat too, beat standard weight decay in a solid sweep.
Hugging Face timm adopted Cautious Optimizers. Ross Wightman:
"One of the last minute papers I added support for that delayed this release was 'Cautious Optimizers'... Consider me impressed, this boost appears more consistent than some of the new optimizers."
RW
Ross Wightman@wightmanr
One of the last minute papers I added support for that delayed this release was 'Cautious Optimizers'... Consider me impressed, this boost appears more consistent than some of the new optimizers.
Contributing to google-deepmind/simply, DeepMind's minimal and scalable JAX research codebase for frontier LLM research.
Invited Talks & Lectures
Meta AI: "Communication Efficient Distributed Training with Distributed Lion"
(August 2024).
Host: Raghu Krishnamoorthi.