ICML 2025
PhyGenBench proposes a comprehensive benchmark evaluating physical commonsense in video generation models, towards building world simulators.
Video Generation & Agentic RL
I am a research intern at Ant Group working on Video Generation and VLM RL post-training. I received my B.Eng. degree from Beijing University of Posts and Telecommunications (BUPT) in 2025.
I am deeply interested in Video Generation and Agentic RL.
ICML 2025
PhyGenBench proposes a comprehensive benchmark evaluating physical commonsense in video generation models, towards building world simulators.
NeurIPS 2025
VideoREPA learns physical knowledge for video generation via relational alignment with foundation models.
Preprint
Gym-V provides a unified environment for agentic vision research, enabling systematic evaluation of vision-language models.
ICCV 2025
LangBridge proposes interpreting images as combinations of language embeddings, bridging vision and language representations.
Preprint · ⭐ 100+ Stars
WISE introduces a world knowledge-informed semantic evaluation framework for text-to-image generation.
Preprint
SRUM proposes a fine-grained self-rewarding mechanism for training unified multimodal models.
Preprint
UniSandbox investigates the relationship between understanding and generation capabilities in unified multimodal models.
Evaluate OpenClaw in multimodal scenes.
Open evaluation toolkit for text-to-image generation.