wan-2.5-preview

Wan 2.5 preview

Wan 2.5-preview is the latest video generation model from Alibaba's Wan AI, representing a significant leap in multimodal capabilities.

Wan 2.5 preview

Major Upgrades

A massive leap from Wan 2.2, Wan 2.5-preview introduces synchronized audio generation (voice, SFX, BGM) and realistic lip-syncing for characters, bringing it in line with multimodal competitors like Sora 2 and Veo 3.

Supports longer video generation (up to 10 seconds) with significantly improved motion dynamics and reduced flickering, doubling the typical duration of its predecessor.

Native support for 1080p output, offering a crisp visual upgrade over the standard 720p of Wan 2.2.

Model Details

PublisherWan AI (Alibaba)
Open StatusClosed Source
Model ParameterNot Disclosed
MultimodalT2V, I2V, T2VA, I2VA, V2V
Including ModelsWan 2.5-preview
Output Aspect Ratio16:9, 9:16, 1:1
Output Resolution480p, 720p, 1080p
Output DurationUp to 10s
Output Frame Rate24fps

Summary

Wan 2.5-preview represents a significant evolution for Alibaba's video AI, bridging the gap with top-tier competitors by integrating native audio and lip-sync capabilities. With improved resolution (1080p), longer clip durations, and enhanced motion stability, it offers a compelling, cost-effective alternative to Sora 2 and Veo 3. While it may still trail slightly in absolute physical realism compared to Sora 2, its open-extension philosophy and rapid progress make it a formidable player.

Key Features

Advanced capabilities to animate static images with depth and motion, perfect for storytelling and social media content.

Flexible input options allowing combinations of text, image, and audio to guide video generation.

Positioned as a more accessible and affordable alternative to proprietary giants, with competitive pricing models on supported platforms.

Video Showcases

animal
unusual activity

Dogs are the players at The World Series Of Poker and they are drinking big bowls of water very sloppily and splashing water on the cards and on the felt of the poker table, one dog poker player is tilting their head sideways in confusion.

camera motion
human - activity

A low-angle shot of a dancer leaping gracefully into the air, making their movement appear even more dynamic and powerful.

unusual subject
high motion level

A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.

scene
camera motion

A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.

Performance Metrics

Wan 2.5 preview Model Capability Assessment (Dec 20, 2025)

Wan 2.5 preview
Radar chart showing model performance metrics20406080100SubjectConsistencyTemporalConsistencyAesthetic &Image QualityDynamics &MotionFidelityVisualQualitySemanticAlignment

Wan 2.5 preview Metrics Bar Charts by Dimension

Visual Quality

Visual Quality Metrics

PSNR
55.6
SSIM
58.5
LPIPS
45.0
FVD
52.2
Inception Score (IS)
47.3
020406080100

Score (normalized)

Temporal Consistency

Temporal Consistency Metrics

Temporal Warping Error
42.0
Optical Flow Consistency
63.0
Temporal Flicker Score
58.2
Long-term Consistency Tracking
55.8
Motion Smoothness
60.0
020406080100

Score (normalized)

Semantic Alignment

Semantic Alignment Metrics

CLIP Score
45.0
Tag2Text / UMT / GRiT
45.0
Semantic Accuracy
54.0
020406080100

Score (normalized)

Subject Consistency

Subject Consistency Metrics

DINO Feature Similarity
49.5
Object Identity Tracking
48.0
Multiple Object Consistency
72.9
020406080100

Score (normalized)

Aesthetic & Image Quality

Aesthetic & Image Quality Metrics

LAION Aesthetic Predictor
48.0
MUSIQ Score
60.0
Color/Texture Consistency
72.0
Human-Opinion MOS
69.0
020406080100

Score (normalized)

Dynamics & Motion

Dynamics & Motion Metrics

Action Recognition Accuracy
57.0
Dynamics Controllability
60.0
Motion Diversity Score
66.0
Physical Realism Score
64.8
020406080100

Score (normalized)

Service Providers

W

Wan.video

The official web interface for Wan, offering a suite of creation and editing tools for users.

API Providers

R

Replicate

Hosts Wan 2.5 models for scalable cloud inference.

People Also Ask

Wan 2.5 AI is Alibaba's next-generation multimodal video generation model that produces high-quality videos up to 1080p resolution and 10 seconds in duration with native audio synchronization, supporting text-to-video, image-to-video, and speech-to-video generation. It features one-pass audio-visual synchronization, enhanced prompt adherence, improved physics simulation, and professional cinematic camera controls compared to Wan 2.2.

The Wan 2.5 API provided through Alibaba Cloud's DashScope platform is censored and filters NSFW content, unlike the earlier open-source Wan 2.2 weights. While Wan 2.5 has less restrictive content policies than mainstream generators like Sora or Kling, the commercial API enforces content guidelines to comply with legal and ethical requirements

Wan 2.5 is currently accessible primarily through cloud-based platforms and APIs rather than local installation, as the full open-source weights have not been publicly released in the same manner as Wan 2.2. Users can access Wan 2.5 through platforms like GoEnhance AI, Higgsfield, or API services that integrate with Alibaba Cloud's DashScope, without requiring local setup or model downloads.

You can use Wan 2.5 through web-based platforms by entering text prompts or uploading images to generate videos, selecting desired resolution (480p, 720p, or 1080p), and optionally configuring camera movements and audio preferences. The model generates 10-second videos with synchronized audio in approximately 10 seconds to a few minutes depending on resolution and complexity, accessible through browser interfaces or API integrations.

No, Wan 2.5 is not currently available as an open-source release with publicly downloadable model weights like Wan 2.2 was. It is primarily offered as a commercial service through Alibaba Cloud's DashScope platform and third-party integrations, though components like the Wan2.5-VAE have been made available on Hugging Face.

No, Wan 2.5 is not uncensored—the API version implements content filtering and moderation policies to comply with platform guidelines and legal requirements. Unlike the open-source Wan 2.2 model which had no built-in filters, Wan 2.5's commercial deployment includes censorship mechanisms, though it reportedly has more flexible content policies than some competing platforms like Sora or Kling.

References