Hi! I’m Yifei Huang (黄逸飞). I am currently a Project Researcher (特任研究員) in Sato Laboratory, the University of Tokyo. I am also in active collaboration with Shanghai AI Lab, working with Dr. Jiangmiao Pang. I received my PhD and M.S. from the Graduate School of Information Science and Technology at the University of Tokyo, supervised by Prof. Yoichi Sato, under the support of the Global Creative Leader program of the University of Tokyo. I received my B.S. in Automation in IEEE honor class of Shanghai Jiao Tong University. I am fortunate to have worked with esteemed researchers like Prof. Yoichi Sato, Prof. Yusuke Sugano, Prof. Yu Qiao, Prof. Limin Wang, Prof. Kris Kitani, Prof. Kai Kunze, and Prof. Weidi Xie. I focus on exciting topics in video understanding, egocentric vision, and their applications, especially in embodied AI and VR/AR.

💻 Researches

I have published 20+ papers at the top international AI conferences with 3000+ Google Scholar citations. My primary research interests lie in:

First-person (egocentric) videos, egocentric gaze, and gaze-guided interaction systems.
Large Vision-language Models for Embodied AI.
Video understanding from limited labels, few-shot learning, domain adaptation.

Please feel free to contact me by email for any suggestions, questions, or potential collaborations.

🗞️ Academic Services

Area Chair: ICCV, CVPR, ICLR.
Reviewer: T-PAMI, IJCV, CVPR, ICCV, ECCV, ACCV, ICML, NeurIPS, ICLR, AAAI, TCSVT, ICRA, IWMUT, etc.

📝 Publications

(* denotes corresponding author)

📒 Topic: First-person (egocentric) Videos, Egocentric Gaze, and Gaze-guided Interaction Systems

Vinci: A Real-time Smart Assistant based on Egocentric Vision-language Model for Portable Devices | [Code] Y. Huang, J. Xu, B. Pei, L. Yang et, al. IMWUT 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning | [Code] | [Data] B. Pei, Y. Huang*, J. Xu, G. Chen, Y. He, Y. Yang, Y. Wang, W. Xie, Y. Qiao, F. Wu, L. Wang ICLR 2025
Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition | [Code] M. Zhang, Y. Huang*, R. Liu, Y. Sato ECCV 2024
Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions | [Project & Code] | [BibTex]
Y. Huang, M. Cai, Z. Li, F. Lu, and Y. Sato.
IEEE TIP 2020
An Ego-Vision System for Discovering Human Joint Attention | [Project & Code] | [BibTex]
Y. Huang, M. Cai, and Y. Sato.
IEEE THMS 2020
Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data | [BibTex]
Y. Huang, X. Li, L. Yang, L. Gu, Y. Zhu, H. Seo, Q. Meng, T. Harada, and Y. Sato.
BMVC 2021
Predicting gaze in egocentric videos by learning task-dependent attention transition | [Project] | [Code & Data] | [BibTex]
Y. Huang, M. Cai, Z. Li, and Y. Sato. (oral presentation, acceptance rate: 2%)
ECCV 2018
Goal-Oriented Gaze Estimation for Zero-Shot Learning | [BibTex]
Y.Liu, L.Zhou, X.Bai, Y. Huang, L. Gu, J. Zhou and T. Harada. CVPR 2021
GazeSync: Eye Movement Transfer Using an Optical Eye Tracker and Monochrome Liquid Crystal Displays | [BibTex]
Q. Zhang, Y. Huang, G. Chernyshov, J. Li, YS. Pai, and K. Kunze.
IUI 2022
Seeing our Blind Spots: Smart Glasses-based Simulation to Increase Design Students’ Awareness of Visual Impairment | [BibTex]
Q. Zhang, G. Barbareschi, Y. Huang, J. Li, YS. Pai, J. Ward and K. Kunze.
UIST 2022

📒 Topic: General Video Understanding, Video Understanding with Limited Labels.

Matching Compound Prototypes for Few-Shot Action Recognition | [Code] | [BibTex]
Y. Huang, L. Yang, G. Chen, H. Zhang, F. Lu, and Y. Sato.
IJCV 2024
Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training | [Code] | [BibTex]
Y. Huang, L. Yang, and Y. Sato.
CVPR 2023
Compound Prototype Matching for Few-Shot Action Recognition | [Code] | [BibTex]
Y. Huang, L. Yang, and Y. Sato.
ECCV 2022
Improving Action Segmentation via Graph-based Temporal Reasoning | [Code] | [BibTex]
Y. Huang, Y. Sugano and Y. Sato.
CVPR 2020
Interact before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition | [BibTex]
L. Yang, Y. Huang*, Y. Sugano and Y. Sato.
CVPR 2022
Retrieval-augmented Egocentric Video Captioning | [BibTex] | [Project&Code]
J. Xu, Y. Huang, J. Hou, G. Chen, Y. Zhang, R. Feng, and W. Xie.
CVPR 2024
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos J. Xu, Y. Huang, B. Pei, J. Hou, Q. Li, G. Chen, Y. Zhang, R. Feng, and W. Xie.
ICLR 2025
Prompt-augmented Boundary Attentive Learning for Weakly Supervised Temporal Sentence Grounding Z. Zhu, Y. Huang*, M. Zhang, L. Ouyang, Y. Sato
Egocentric Action-aware Inertial Localization in Point Clouds with Vision-Language Guidance M. Zhang, R. Yonetani, Y. Huang*, L. Ouyang, R. Liu, Y. Sato ICCV 2025
Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition Z. Qian, X. Yao, Y. Huang, C. Zhang, J. Ying, H. Sun ICCV 2025
Learning Streaming Video Representation via Multitask Training Y. Yan, J. Xu, S. Di, Y. Liu, Y. Shi, Q. Chen, Z. Li, Y. Huang and W. Xie. ICCV 2025

📒 Topic: Egocentric & Video Benchmarks.

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World | [Project&Code] | [BibTex]
Y. Huang, G. Chen, J. Xu, … , Y. Qiao
CVPR 2024
Ego4D: Around the World in 3,000 Hours of Egocentric Video | [Project] | [BibTex]
K. Grauman, A. Westbury, …, Y. Huang, …, J. Malik.
CVPR 2022 (Best paper finalist.)
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives | [Project] | [BibTex]
K. Grauman, A. Westbury, …, Y. Huang, …, J. Malik.
CVPR 2024 (oral presentation)
ActionVOS: Actions as Prompts for Video Object Segmentation | [Project&Code] L. Ouyang, R. Liu, Y. Huang*, R. Furuta, and Y. Sato. ECCV 2024 (oral presentation)
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding | [Leaderboard] G. Chen*, Y. Liu*, Y. Huang*, Y. He, B. Pei, J. Xu, Y. Wang, T. Lu, L. Wang ICLR 2025

🔥 News

Vinci is accepted by IMWUT 2025.
3 Papers accepted by ICCV 2025.
4 Papers accepted by ICLR 2025.
3 Papers accepted by ECCV 2024, in which ActionVOS got accepted as oral!
3 Papers accepted by CVPR 2024.
Received Special Grant for Foreign Researchers (¥11,000,000) from JSPS.
Received Grant-in-Aid for Early-Career Scientists (¥4,550,000) from JSPS.