Hi! I’m Yifei Huang (黄逸飞). I am currently a Project Researcher (特任研究員) in Sato Laboratory, the University of Tokyo. I am also in active collaboration with Shanghai AI Lab, working with Dr. Jiangmiao Pang. I received my PhD and M.S. from the Graduate School of Information Science and Technology at the University of Tokyo, supervised by Prof. Yoichi Sato, under the support of the Global Creative Leader program of the University of Tokyo. I received my B.S. in Automation in IEEE honor class of Shanghai Jiao Tong University. I am fortunate to have worked with esteemed researchers like Prof. Yoichi Sato, Prof. Yusuke Sugano, Prof. Yu Qiao, Prof. Limin Wang, Prof. Kris Kitani, Prof. Kai Kunze, and Prof. Weidi Xie. I focus on exciting topics in video understanding, egocentric vision, and their applications, especially in embodied AI and VR/AR.
We have intern positions in Shanghai (offline). If you are interested in working on LVLM for embodied AI, feel free to contact me at hyf015 at gmail dot com.
💻 Researches
I have published 20+ papers at the top international AI conferences with 3000+ Google Scholar citations. My primary research interests lie in:
-
First-person (egocentric) videos, egocentric gaze, and gaze-guided interaction systems.
-
Large Vision-language Models for Embodied AI.
-
Video understanding from limited labels, few-shot learning, domain adaptation.
Please feel free to contact me by email for any suggestions, questions, or potential collaborations.
🗞️ Academic Services
- Area Chair: ICCV, CVPR.
- Reviewer: T-PAMI, IJCV, CVPR, ICCV, ECCV, ACCV, ICML, NeurIPS, ICLR, AAAI, TCSVT, ICRA, IWMUT, etc.
📝 Publications
(* denotes corresponding author)
📒 Topic: First-person (egocentric) Videos, Egocentric Gaze, and Gaze-guided Interaction Systems
-
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning | [Code] | [Data] B. Pei, Y. Huang*, J. Xu, G. Chen, Y. He, Y. Yang, Y. Wang, W. Xie, Y. Qiao, F. Wu, L. Wang ICLR 2025
-
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition M. Zhang, Y. Huang*, R. Liu, Y. Sato ECCV 2024
-
Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions | [Project & Code] | [BibTex]
Y. Huang, M. Cai, Z. Li, F. Lu, and Y. Sato.
IEEE TIP 2020 -
An Ego-Vision System for Discovering Human Joint Attention | [Project & Code] | [BibTex]
Y. Huang, M. Cai, and Y. Sato.
IEEE THMS 2020 -
Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data | [BibTex]
Y. Huang, X. Li, L. Yang, L. Gu, Y. Zhu, H. Seo, Q. Meng, T. Harada, and Y. Sato.
BMVC 2021 -
Predicting gaze in egocentric videos by learning task-dependent attention transition | [Project] | [Code & Data] | [BibTex]
Y. Huang, M. Cai, Z. Li, and Y. Sato. (oral presentation, acceptance rate: 2%)
ECCV 2018 -
Goal-Oriented Gaze Estimation for Zero-Shot Learning | [BibTex]
Y.Liu, L.Zhou, X.Bai, Y. Huang, L. Gu, J. Zhou and T. Harada. CVPR 2021 -
GazeSync: Eye Movement Transfer Using an Optical Eye Tracker and Monochrome Liquid Crystal Displays | [BibTex]
Q. Zhang, Y. Huang, G. Chernyshov, J. Li, YS. Pai, and K. Kunze.
IUI 2022 -
Seeing our Blind Spots: Smart Glasses-based Simulation to Increase Design Students’ Awareness of Visual Impairment | [BibTex]
Q. Zhang, G. Barbareschi, Y. Huang, J. Li, YS. Pai, J. Ward and K. Kunze.
UIST 2022
📒 Topic: General Video Understanding, Video Understanding with Limited Labels.
-
Matching Compound Prototypes for Few-Shot Action Recognition | [Code] | [BibTex]
Y. Huang, L. Yang, G. Chen, H. Zhang, F. Lu, and Y. Sato.
IJCV 2024 -
Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training | [Code] | [BibTex]
Y. Huang, L. Yang, and Y. Sato.
CVPR 2023 -
Compound Prototype Matching for Few-Shot Action Recognition | [Code] | [BibTex]
Y. Huang, L. Yang, and Y. Sato.
ECCV 2022 -
Improving Action Segmentation via Graph-based Temporal Reasoning | [Code] | [BibTex]
Y. Huang, Y. Sugano and Y. Sato.
CVPR 2020 -
Interact before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition | [BibTex]
L. Yang, Y. Huang*, Y. Sugano and Y. Sato.
CVPR 2022 -
Retrieval-augmented Egocentric Video Captioning | [BibTex] | [Project&Code]
J. Xu, Y. Huang, J. Hou, G. Chen, Y. Zhang, R. Feng, and W. Xie.
CVPR 2024 -
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos J. Xu, Y. Huang, B. Pei, J. Hou, Q. Li, G. Chen, Y. Zhang, R. Feng, and W. Xie.
ICLR 2025 -
Prompt-augmented Boundary Attentive Learning for Weakly Supervised Temporal Sentence Grounding Z. Zhu, Y. Huang*, M. Zhang, L. Ouyang, Y. Sato
📒 Topic: Egocentric & Video Benchmarks.
-
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World | [Project&Code] | [BibTex]
Y. Huang, G. Chen, J. Xu, … , Y. Qiao
CVPR 2024 -
Ego4D: Around the World in 3,000 Hours of Egocentric Video | [Project] | [BibTex]
K. Grauman, A. Westbury, …, Y. Huang, …, J. Malik.
CVPR 2022 (Best paper finalist.) -
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives | [Project] | [BibTex]
K. Grauman, A. Westbury, …, Y. Huang, …, J. Malik.
CVPR 2024 (oral presentation) -
ActionVOS: Actions as Prompts for Video Object Segmentation | [Project&Code] L. Ouyang, R. Liu, Y. Huang*, R. Furuta, and Y. Sato. ECCV 2024 (oral presentation)
-
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding | [Leaderboard] G. Chen*, Y. Liu*, Y. Huang*, Y. He, B. Pei, J. Xu, Y. Wang, T. Lu, L. Wang ICLR 2025
🔥 News
- 4 Papers accepted by ICLR 2025.
- 3 Papers accepted by ECCV 2024, in which ActionVOS got accepted as oral!
- 3 Papers accepted by CVPR 2024.
- Served as an Area Chair for ICCV 2023 and CVPR 2024.
- Received Special Grant for Foreign Researchers (¥11,000,000) from JSPS.
- Received Grant-in-Aid for Early-Career Scientists (¥4,550,000) from JSPS.