wan-2.2

Wan 2.2

Wan2.2 is a state-of-the-art open-source video generative model released by Alibaba's Tongyi Lab (Wan AI Team), designed to democratize high-quality video creation.

Major Upgrades

Wan2.2 utilizes a novel MoE architecture that separates the denoising process into specialized stages, significantly enhancing model capacity and performance without increasing computational cost.

Trained on a massive dataset (65.6% more images, 83.2% more videos than Wan2.1) with rich aesthetic labels, delivering superior texture, lighting, and color consistency suitable for cinematic production.

The introduction of Wan2.2-S2V-14B allows for high-fidelity video generation driven directly by audio inputs, achieving state-of-the-art performance in lip-sync and motion synchronization.

Model Details

Publisher	Wan AI (Alibaba)
Open Status	Open Source (Apache 2.0)
Model Parameter	5B (Hybrid), 14B (MoE)
Multimodal	T2V, I2V, S2V, V2V
Including Models	Wan2.2-T2V-14B, Wan2.2-I2V-14B, Wan2.2-S2V-14B, Wan2.2-TI2V-5B
Output Aspect Ratio	16:9, 9:16, 1:1, 4:3
Output Resolution	480p, 720p, 1080p
Output Duration	5s (optimized), up to 10s
Output Frame Rate	16fps, 24fps

Summary

Wan2.2 stands out as a powerful open-source contender, leveraging a Mixture-of-Experts architecture to deliver high-fidelity, cinematic video generation. Its unique "Last Frame" control and visual text generation capabilities offer creators unprecedented precision, while its efficient 5B variant democratizes access to high-quality video synthesis. Although its audio-driven metrics (PSNR/SSIM) show room for improvement in specific tasks, its overall visual aesthetic and motion smoothness are top-tier.

Key Features

A unique feature allowing users to specify the final frame of the video, enabling precise control over transitions and ending states.

The first video model capable of generating coherent Chinese and English text within the video content.

Video Showcases

animal

unusual activity

Dogs are the players at The World Series Of Poker and they are drinking big bowls of water very sloppily and splashing water on the cards and on the felt of the poker table, one dog poker player is tilting their head sideways in confusion.

camera motion

human - activity

A low-angle shot of a dancer leaping gracefully into the air, making their movement appear even more dynamic and powerful.

unusual subject

high motion level

A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.

scene

camera motion

A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.

Performance Metrics

Wan 2.2 Model Capability Assessment (Dec 20, 2025)

Wan 2.2

Wan 2.2 Metrics Bar Charts by Dimension

Visual Quality

Visual Quality Metrics

PSNR

45.0

SSIM

47.3

LPIPS

36.0

FVD

18.4

Inception Score (IS)

36.0

020406080100

Score (normalized)

Temporal Consistency

Temporal Consistency Metrics

Temporal Warping Error

42.0

Optical Flow Consistency

54.0

Temporal Flicker Score

48.0

Long-term Consistency Tracking

43.2

Motion Smoothness

48.0

020406080100

Score (normalized)

Semantic Alignment

Semantic Alignment Metrics

CLIP Score

42.0

Tag2Text / UMT / GRiT

36.0

Semantic Accuracy

45.0

020406080100

Score (normalized)

Subject Consistency

Subject Consistency Metrics

DINO Feature Similarity

40.5

Object Identity Tracking

42.0

Multiple Object Consistency

49.5

020406080100

Score (normalized)

Aesthetic & Image Quality

Aesthetic & Image Quality Metrics

LAION Aesthetic Predictor

36.0

MUSIQ Score

54.0

Color/Texture Consistency

66.0

Human-Opinion MOS

54.0

020406080100

Score (normalized)

Dynamics & Motion

Dynamics & Motion Metrics

Action Recognition Accuracy

36.0

Dynamics Controllability

48.0

Motion Diversity Score

54.0

Physical Realism Score

32.4

020406080100

Score (normalized)

Service Providers

Hugging Face

Hosts the official model weights and inference spaces for Wan2.2, allowing users to try the model directly in the browser.

API Providers

Fal.ai

Offers optimized API endpoints for Wan2.2, including the 5B and 14B variants, suitable for enterprise-grade integration.

Replicate

Provides scalable API access to Wan2.2 models, including speed-optimized versions for rapid generation.

People Also Ask

Wan 2.2 AI is an open-source, large-scale video generative model that uses a Mixture-of-Experts diffusion architecture to produce high-quality 720p videos from text, images, speech, or combinations of these inputs. It supports text-to-video, image-to-video, text-image-to-video, speech-to-video, and character animation modes, and is designed to run both in research/production backends and on high-end consumer GPUs such as RTX 4090.

The core Wan 2.2 model weights are released under Apache 2.0 and do not include built-in content filters, so technically the model can be prompted to generate NSFW content when run locally or in third-party tools that do not add extra safety layers. However, the official Wan services and most commercial platforms require users to comply with their usage policies and applicable laws, which typically prohibit illegal or harmful content even if the model itself is not hard-censored.

To install Wan 2.2 locally, you generally clone the official GitHub repository, install the Python dependencies, and then download one or more model checkpoints (e.g., T2V-A14B, I2V-A14B, TI2V-5B, S2V-14B, Animate-14B) from Hugging Face or ModelScope. A typical setup involves git clone of the Wan2.2 repo, pip install -r requirements.txt (plus optional extras like speech-to-video requirements), and then using the provided scripts or Diffusers/ComfyUI integrations to load the downloaded weights.

Wan 2.2 can be used through several interfaces: the official wan.video website, native Python/CLI scripts (generate.py for different tasks), and integrations with frameworks like Diffusers and ComfyUI. In practice, you choose a task (such as text-to-video or image-to-video), specify resolution and checkpoints, provide a prompt and optional reference media, and then run generation either locally on your GPU or via supported online interfaces.

Yes, the Wan 2.2 model weights and code are released as open source under the Apache 2.0 license, so you can download, modify, and use them (including commercially) without paying licensing fees to the authors, subject to the license terms. Some hosted services and cloud providers that expose Wan 2.2 (for example, web UIs or GPU rental platforms) may charge for compute, storage, or premium features, even though the underlying model itself is free.

From a compliance perspective, the official project states that users are responsible for ensuring their generated content does not violate laws or cause harm, and the models are distributed under Apache 2.0 with explicit responsibility and usage restrictions in the license and usage policy. From a security and privacy perspective, running Wan 2.2 locally keeps your data on your own hardware, while third-party NSFW or general-purpose Wan 2.2 services typically emphasize private storage of outputs but have varying safety, moderation, and logging practices that you should review individually.

Wan 2.2

Major Upgrades

Mixture-of-Experts (MoE) Architecture

Enhanced Cinematic Quality & Data Scale

Audio-Driven Capabilities

Model Details

Summary

Key Features

"Last Frame" Conditioning

Visual Text Generation

Video Showcases

Performance Metrics

Wan 2.2 Model Capability Assessment (Dec 20, 2025)

Wan 2.2 Metrics Bar Charts by Dimension

Visual Quality

Visual Quality Metrics

Temporal Consistency

Temporal Consistency Metrics

Semantic Alignment

Semantic Alignment Metrics

Subject Consistency

Subject Consistency Metrics

Aesthetic & Image Quality

Aesthetic & Image Quality Metrics

Dynamics & Motion

Dynamics & Motion Metrics

Service Providers

Hugging Face

API Providers

Fal.ai

Replicate

People Also Ask

References

Wan 2.2