熟女

Cosmos 3: Omnimodal World Models for Physical AI

发布时间:2026-06-10

时   间:10:00-11:00, Jun 12, 2026 (Fri)

地   点:在线会议://meeting.tencent.com/dm/nDbunv9P6LYg

内容:

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI, effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state of the art across a diverse suite of understanding and generation tasks, highlighting omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source text-to-image and image-to-video models by Artificial Analysis, and as the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation’s OpenMDW-1.1 License at github.com/nvidia/cosmos and huggingface.co/collections/nvidia/cosmos3. The project website is available at research.nvidia.com/labs/cosmos-lab/cosmos3.

个人简介:

Max Li (李赵硕) is a Tech Lead at NVIDIA’s Cosmos Lab, where he works on large-scale world foundation models for Physical AI. His research focuses on spatial intelligence: enabling AI systems to perceive, understand, simulate, and interact with the physical world. His work has contributed to NVIDIA Cosmos and NVIDIA Edify, and his research has appeared in leading venues including CVPR, ICCV, ECCV, NeurIPS, ICLR, and SIGGRAPH. His work has received recognition including TIME’s Best Inventions of 2023 and CES 2025 Best AI & Best Overall. He received his Ph.D. in Computer Science from Johns Hopkins University and his B.Eng. in Mechatronics Engineering from the University of British Columbia.
返回列表
演讲人 Max Li (李赵硕) 时间 10:00-11:00, Jun 12, 2026 (Fri)
地点 在线会议://meeting.tencent.com/dm/nDbunv9P6LYg EN
TOP