Tyler Zhu
I am a 2nd year PhD student in CS at Princeton advised by Olga Russakovsky.
I received my B.S. in EECS from Berkeley in 2022 and my M.S. in 2023, advised by Jitendra Malik.
I am a recipient of the Princeton President's Fellowship.
I am broadly interested in creating computer vision systems which can learn from and interpret visual data as humans do.
While at Berkeley, I've had the great fortune to have collaborated with and been mentored by a number of
wonderful people, including Karttikeya
Mangalam, Alvin Wan, and Dan Hendrycks.
I was also heavily involved in teaching and outreach, serving on CS 70 course
staff multiple times and previously leading Machine Learning @ Berkeley. You can find
out
more from my main website here.
If you are interested in collaborating, or just want to reach out and chat about research or advice, feel free to reach out to me at [first][last][at]cs[dot]princeton[dot]edu.
CV  / 
Google
Scholar  / 
Twitter  / 
Github
|
|
News
-
[Mar 2024] Our preprint on large image modeling, xT, is now available on arXiv (May 2024: and was accepted to ICML)! I will also be co-organizing the Transformers for Vision Workshop @ CVPR 2024 after the great experience I had attending last year.
-
[May 2023] Our paper on fast reversible transformers, PaReprop, was accepted as a spotlight at the Transformers for Vision Workshop @ CVPR 2023!
-
[Apr 2023] I am starting my PhD at Princeton in Fall 2023, advised by Professor Olga Russakovsky!
|
Research
I'm broadly interested in computer vision to create visual systems which can effectively reason and interact with the real world.
Currently, I am most interested at the intersection of video and language.
I believe that understanding how to promote videos as a fundamental unit of vision can be key to unlocking the next generation of visual systems.
Real-world interaction is also governed by language, thus I am interested in how to bridge the gap between the two modalities.
This includes understanding how to learn from videos efficiently, how to reason about them, and how to represent them.
I am also interested in large models, particularly vision models, and how to make them more efficient and broadly useful.
Much of my previous work has been on general purpose memory-efficient techniques to make this possible.
|
|
Unifying Specialized Visual Encoders for Video Language Models
Jihoon Chung*,
Tyler Zhu*,
Max Gonzalez Saez-Diez,
Juan Carlos Niebles,
Honglu Zhou,
Olga Russakovsky
arXiv 2024
code
/
arXiv
We propose a framework for using many visual encoders covering broad visual categories like action recognition and spatial understanding as a unified visual encoder for video LLMs.
This trend is exciting as it could allow our model to scale visual processing with the number of GPUs and run them all in parallel (sharding one expert per device) while still retaining similar runtimes to just a single expert.
|
|
xT: Nested Tokenization for Larger Context in Large Images
Ritwik Gupta*,
Shufan Li*,
Tyler Zhu*,
Jitendra Malik,
Trevor Darrell,
Karttikeya Mangalam
ICML 2024
code
/
arXiv
/
tweet
A simple yet effective framework for adapting vision models trained on small, 224x224 images to larger images with larger context by using an LLM-style encoder to integrate context over larger regions than otherwise possible.
We also proposed a set of effective benchmarks for reflecting such improvements on larger images.
|
|
PaReprop: Fast Parallelized Reversible Backpropagation
Tyler Zhu*,
Karttikeya Mangalam*
Transformers for Vision Workshop @ CVPR 2023 (Spotlight Paper)
code
/
arXiv
/
tweet
We overcome the extra overhead of reversible transformers by parallelizing the backward pass using CUDA streams.
This speeds up training for models in both vision and language, making them nearly as fast as the base models with incredible memory savings to boot.
|
|
The many faces of robustness; A critical analysis of
out-of-distribution generalization.
Dan Hendrycks,
Steven Basart,
Norman Mu,
Saurav Kadavath,
Frank Wang,
Evan Dorundo,
Rahul Desai,
Tyler Zhu,
Samyak Parajuli,
Mike Guo,
Dawn Song,
Jacob Steinhardt,
Justin Gilmer
ICCV 2021
code
/
arXiv
Four new datasets measuring real-world distribution shifts, most well-known of which is ImageNet-R(enditions), as well as a new
state-of-the-art data augmentation method that outperforms models pretrained
with 1000x more labeled data.
|
|
Making Reversible Transformers Accurate, Efficient, and Fast
Tyler Zhu
Master's Thesis
In this work, we present an in-depth analysis of reversible transformers and demonstrate that they can be more accurate, efficient, and fast than their vanilla counterparts. We introduce a new method of reversible backpropagation which is faster and scales better with memory than previous techniques, and also demonstrate new results which show that reversible transformers transfer better to downstream visual tasks.
|
Guided Resource and Education Program: High School Workshop Initiative
Advisor (behind the scenes)
Machine Learning at Berkeley, Fall 2023
We piloted a free two-day workshop for local Bay Area high school students with little access to coding resources to teach them the basics of machine learning.
Our goal was to be inclusive and representative of all backgrounds and experiences, and we were able to reach over 40 students evenly split between male and female participants.
|
Broadening Research Collaborations Workshop
Co-organizer
NeurIPS 2022
We organized a workshop at NeurIPS 2022 to bring together researchers from different backgrounds and experiences to discuss the challenges and opportunities in non-traditional collaborations beyond the standard academic and industry models.
|
|