Wan 2.2
Wan2.2 is a state-of-the-art open-source video generative model released by Alibaba's Tongyi Lab (Wan AI Team), designed to democratize high-quality video creation.

Major Upgrades
Wan2.2 utilizes a novel MoE architecture that separates the denoising process into specialized stages, significantly enhancing model capacity and performance without increasing computational cost.
Trained on a massive dataset (65.6% more images, 83.2% more videos than Wan2.1) with rich aesthetic labels, delivering superior texture, lighting, and color consistency suitable for cinematic production.
The introduction of Wan2.2-S2V-14B allows for high-fidelity video generation driven directly by audio inputs, achieving state-of-the-art performance in lip-sync and motion synchronization.
Model Details
| Publisher | Wan AI (Alibaba) |
|---|---|
| Open Status | Open Source (Apache 2.0) |
| Model Parameter | 5B (Hybrid), 14B (MoE) |
| Multimodal | T2V, I2V, S2V, V2V |
| Including Models | Wan2.2-T2V-14B, Wan2.2-I2V-14B, Wan2.2-S2V-14B, Wan2.2-TI2V-5B |
| Output Aspect Ratio | 16:9, 9:16, 1:1, 4:3 |
| Output Resolution | 480p, 720p, 1080p |
| Output Duration | 5s (optimized), up to 10s |
| Output Frame Rate | 16fps, 24fps |
Summary
Wan2.2 stands out as a powerful open-source contender, leveraging a Mixture-of-Experts architecture to deliver high-fidelity, cinematic video generation. Its unique "Last Frame" control and visual text generation capabilities offer creators unprecedented precision, while its efficient 5B variant democratizes access to high-quality video synthesis. Although its audio-driven metrics (PSNR/SSIM) show room for improvement in specific tasks, its overall visual aesthetic and motion smoothness are top-tier.
Key Features
A unique feature allowing users to specify the final frame of the video, enabling precise control over transitions and ending states.
The first video model capable of generating coherent Chinese and English text within the video content.
Video Showcases
Dogs are the players at The World Series Of Poker and they are drinking big bowls of water very sloppily and splashing water on the cards and on the felt of the poker table, one dog poker player is tilting their head sideways in confusion.
A low-angle shot of a dancer leaping gracefully into the air, making their movement appear even more dynamic and powerful.
A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.
A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.
Performance Metrics
Wan 2.2 Model Capability Assessment (Dec 20, 2025)
Wan 2.2 Metrics Bar Charts by Dimension
Visual Quality
Visual Quality Metrics
Score (normalized)
Temporal Consistency
Temporal Consistency Metrics
Score (normalized)
Semantic Alignment
Semantic Alignment Metrics
Score (normalized)
Subject Consistency
Subject Consistency Metrics
Score (normalized)
Aesthetic & Image Quality
Aesthetic & Image Quality Metrics
Score (normalized)
Dynamics & Motion
Dynamics & Motion Metrics
Score (normalized)
Service Providers
Hugging Face
Hosts the official model weights and inference spaces for Wan2.2, allowing users to try the model directly in the browser.
API Providers
Fal.ai
Offers optimized API endpoints for Wan2.2, including the 5B and 14B variants, suitable for enterprise-grade integration.
Replicate
Provides scalable API access to Wan2.2 models, including speed-optimized versions for rapid generation.
People Also Ask
Wan 2.2 AI is an open-source, large-scale video generative model that uses a Mixture-of-Experts diffusion architecture to produce high-quality 720p videos from text, images, speech, or combinations of these inputs. It supports text-to-video, image-to-video, text-image-to-video, speech-to-video, and character animation modes, and is designed to run both in research/production backends and on high-end consumer GPUs such as RTX 4090.
The core Wan 2.2 model weights are released under Apache 2.0 and do not include built-in content filters, so technically the model can be prompted to generate NSFW content when run locally or in third-party tools that do not add extra safety layers. However, the official Wan services and most commercial platforms require users to comply with their usage policies and applicable laws, which typically prohibit illegal or harmful content even if the model itself is not hard-censored.
To install Wan 2.2 locally, you generally clone the official GitHub repository, install the Python dependencies, and then download one or more model checkpoints (e.g., T2V-A14B, I2V-A14B, TI2V-5B, S2V-14B, Animate-14B) from Hugging Face or ModelScope. A typical setup involves git clone of the Wan2.2 repo, pip install -r requirements.txt (plus optional extras like speech-to-video requirements), and then using the provided scripts or Diffusers/ComfyUI integrations to load the downloaded weights.
Wan 2.2 can be used through several interfaces: the official wan.video website, native Python/CLI scripts (generate.py for different tasks), and integrations with frameworks like Diffusers and ComfyUI. In practice, you choose a task (such as text-to-video or image-to-video), specify resolution and checkpoints, provide a prompt and optional reference media, and then run generation either locally on your GPU or via supported online interfaces.
Yes, the Wan 2.2 model weights and code are released as open source under the Apache 2.0 license, so you can download, modify, and use them (including commercially) without paying licensing fees to the authors, subject to the license terms. Some hosted services and cloud providers that expose Wan 2.2 (for example, web UIs or GPU rental platforms) may charge for compute, storage, or premium features, even though the underlying model itself is free.
From a compliance perspective, the official project states that users are responsible for ensuring their generated content does not violate laws or cause harm, and the models are distributed under Apache 2.0 with explicit responsibility and usage restrictions in the license and usage policy. From a security and privacy perspective, running Wan 2.2 locally keeps your data on your own hardware, while third-party NSFW or general-purpose Wan 2.2 services typically emphasize private storage of outputs but have varying safety, moderation, and logging practices that you should review individually.
References
