Yi Liu (刘熠)

About Me

I am currently an AIGC Algorithm Engineer at ByteDance (Commercial AI-AIGC), where I lead the development of advertising creative-material Agents — spanning data construction, model training, and product deployment. Before that, I led the Multimodal Understanding & Generation Group at Honor Device Co., Ltd. from 2024 to 2025, managing a team of 10+ engineers on on-device Vision-Language Models. I received my Ph.D. degree at MMLab@SIAT, University of Chinese Academy of Sciences (UCAS), supervised by Prof. Yu Qiao and Prof. Yali Wang in 2024. I was also a research intern at Shanghai AI Laboratory from 2022 to 2023. I received my B.Eng. degree from Huazhong University of Science and Technology (HUST), Wuhan, China, in 2019.

Publications

MagicVL-2B: Empowering Vision-Language Models on Mobile Devices with Lightweight Visual Encoders via Curriculum Learning, arXiv, 2025 (Tech report, 第1作者)

E-VRAG: Enhancing Long Video Understanding with Resource-Efficient Retrieval Augmented Generation, arXiv, 2025 (Tech report, 第1通讯)

VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking, arXiv, 2025 (Under review, 共同通讯)

LvBench: A Benchmark for Long-form Video Understanding with Versatile Multi-modal Question Answering, International Journal of Computer Vision, 2025 (IJCV, 中科院1区, IF=9.3, 共一第3)

MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding, IEEE Signal Processing Letters, 2024 (SPL, 中科院2区, IF=3.9, 第1作者)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark, Computer Vision and Pattern Recognition, 2024 (CVPR, CCF-A会议, 第6作者)

F2S-Net: Learning Frame-To-Segment Prediction for Online Action Detection, Journal of Real-Time Image Processing, 2024 (JRTIP, 中科院3区, IF=3.0, 第1作者)

Dual masked modeling for weakly-supervised temporal boundary discovery, IEEE Transactions on Multimedia, 2023 (TMM, 中科院1区, IF=9.7, 共一第2)

Learning Discriminative Feature Representation for Open Set Action Recognition, ACM International Conference on Multimedia, 2023 (ACM MM, CCF-A会议, 共一第2)

InternVideo: General Video Foundation Models via Generative and Discriminative Learning, arXiv, 2022 (SCIS, 第9作者)

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization, IEEE Transactions on Image Processing, 2022 (TIP, 中科院1区, IF=13.7, 第1作者)

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection, International Conference on Pattern Recognition, 2022 (ICPR, CCF-C会议, 第1作者)

短视频场景在线起始检测任务及方法研究, 集成技术, 2021 (共一第2)

Experience

Work Experience

ByteDance — Commercial AI-AIGC, China Commercialization & Advertising. Creative-Material Agent Lead (AIGC Algorithm Engineer). Nov 2025 – Present

Honor Device Co., Ltd. — LLM Capability Platform. Lead, Multimodal Understanding & Generation Group, managing a team of 10+ engineers. Apr 2024 – Oct 2025

Shanghai AI Laboratory — General Vision Group. Research Intern. Mar 2022 – Dec 2023

Honors & Awards

Professional

2025 — Shining Star Award, Honor AI Platform Department

2024 — AI Multimodal Breakthrough Team Award, Honor R&D Management

Ph.D. @ Chinese Academy of Sciences / UCAS

2023 — Outstanding Merit Student Pacesetter, UCAS

2022 — 1st Place (×2), ECCV Ego4D Episodic Memory Challenge — Looking At Me Track & Moments Queries Track

2020–2022 — President's Outstanding Award (×3), SIAT, Chinese Academy of Sciences

2020 — Merit Student, UCAS

2019 — First Prize, Huike Cup AI Application Innovation Challenge

Undergraduate @ HUST

2019 — Outstanding Graduate, HUST

2018 — National Second Prize & Southern-Region Runner-up, ABU Robocon

2018 — Provincial First Prize, National Mechanical Innovation Design Contest

2017 — Provincial Second Prize, “Challenge Cup” National Student Academic Competition

2017 — Provincial Second Prize, China Undergraduate Mathematical Contest in Modeling

2017 — Meritorious Winner (Honorable Mention), Mathematical Contest in Modeling (MCM/ICM)

Workshops & Challenges

Student organizer of ECCV 2022 DeeperAction Challenge, Track 1: Temporal Action Localization

Student organizer of ICPR 2022 VideoPipe Challenge, Track 2: Temporal Defect Localization

Student organizer of ICCV 2021 DeeperAction Challenge, Track 1: Temporal Action Localization

Google Scholar CV

AIGC Algorithm Engineer at ByteDance

Email: yiliu61richard@gmail.com

Research Interests: Multimodal LLMs, Multimodal Data Synthesis, Long Video Understanding, Temporal Action Detection