sora-2

Sora 2

Sora 2 is the second generation of OpenAI's groundbreaking text-to-video model, designed to simulate the physical world with unprecedented realism.

Sora 2

Major Upgrades

Sora 2 now generates high-fidelity audio, including dialogue, sound effects, and background music, that is perfectly synchronized with the visual action.

Significant advancements in the physics engine allow for hyper-realistic simulation of gravity, collisions, and fluid dynamics, drastically reducing the "hallucinations" and impossible motion seen in earlier versions.

Improved temporal consistency enables longer, more complex narratives where characters and environments remain stable across multiple shots, addressing the "morphing" issues of its predecessor.

Model Details

PublisherOpenAI
Open StatusClosed Source
Model ParameterNot Disclosed
MultimodalT2V, I2V, T2VA, I2VA, V2V
Including ModelsSora 2, Sora 2 Pro
Output Aspect Ratio16:9, 9:16, 1:1
Output Resolution720p, 1080p (Pro)
Output Duration15s (Standard), 25s (Pro)
Output Frame Rate24fps, 30fps

Summary

OpenAI's Sora 2 redefines the boundaries of AI video with its "unreal" realism and seamless audio-visual integration. The introduction of the "Cameo" feature and robust storytelling tools positions it as a versatile platform for creators seeking both high-fidelity output and personalized narrative control. While its clip length is shorter than some competitors, its mastery of physics and motion consistency makes it a benchmark for quality in the field.

Key Features

A groundbreaking personalization tool that allows users to securely insert their own likeness and voice into AI-generated scenes, creating custom avatars and narratives.

Native tools for stitching together multiple generated clips into a cohesive story, complete with storyboard planning for precise narrative control.

Integrated editing capabilities that allow users to refine specific segments, change styles, or adjust pacing without regenerating the entire video.

Support for generating videos up to 25 seconds (Pro users) with maintained quality, a significant step up from the previous 10-second standard.

Video Showcases

animal
unusual activity

Dogs are the players at The World Series Of Poker and they are drinking big bowls of water very sloppily and splashing water on the cards and on the felt of the poker table, one dog poker player is tilting their head sideways in confusion.

camera motion
human - activity

A low-angle shot of a dancer leaping gracefully into the air, making their movement appear even more dynamic and powerful.

unusual subject
high motion level

A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.

scene
camera motion

A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.

Performance Metrics

Sora 2 Model Capability Assessment (Dec 20, 2025)

Sora 2
Radar chart showing model performance metrics20406080100SubjectConsistencyTemporalConsistencyAesthetic &Image QualityDynamics &MotionFidelityVisualQualitySemanticAlignment

Sora 2 Metrics Bar Charts by Dimension

Visual Quality

Visual Quality Metrics

PSNR
65.5
SSIM
67.5
LPIPS
54.0
FVD
54.9
Inception Score (IS)
45.0
020406080100

Score (normalized)

Temporal Consistency

Temporal Consistency Metrics

Temporal Warping Error
54.0
Optical Flow Consistency
72.0
Temporal Flicker Score
51.0
Long-term Consistency Tracking
54.0
Motion Smoothness
64.2
020406080100

Score (normalized)

Semantic Alignment

Semantic Alignment Metrics

CLIP Score
42.0
Tag2Text / UMT / GRiT
54.0
Semantic Accuracy
49.5
020406080100

Score (normalized)

Subject Consistency

Subject Consistency Metrics

DINO Feature Similarity
49.5
Object Identity Tracking
54.0
Multiple Object Consistency
72.0
020406080100

Score (normalized)

Aesthetic & Image Quality

Aesthetic & Image Quality Metrics

LAION Aesthetic Predictor
48.0
MUSIQ Score
66.0
Color/Texture Consistency
75.0
Human-Opinion MOS
72.0
020406080100

Score (normalized)

Dynamics & Motion

Dynamics & Motion Metrics

Action Recognition Accuracy
63.0
Dynamics Controllability
66.0
Motion Diversity Score
76.8
Physical Realism Score
66.6
020406080100

Score (normalized)

Service Providers

S

Sora.com

The official web interface for Sora 2, offering a suite of creation and editing tools for users.

API Providers

O

OpenAI API (Azure)

Enterprise-grade API access via Azure OpenAI Service, providing secure and scalable integration for businesses.

F

Fal.ai

Offers optimized API endpoints for Sora 2, focusing on speed and ease of integration for developers.

People Also Ask

Sora is a generative AI model that creates videos from text descriptions, still images, or existing clips using a diffusion-plus-transformer architecture similar to GPT models. It can render detailed scenes, maintain character consistency, and simulate physics across several seconds of footage for creative and professional use.

Sora 2 is accessed through OpenAI's Sora web portal or mobile app for invited users, and via API through OpenAI and selected providers for developers. After getting access, you enter a text prompt, optionally upload reference assets, choose quality or style presets, then submit and download or refine the generated video.

Sora 2 currently offers free usage for a limited group of invited users, typically through the Sora app or portal, with usage caps based on available compute. Higher-quality "Pro" Sora 2 access is bundled with paid ChatGPT Pro subscriptions and may move toward additional paid tiers or usage-based pricing over time.

Effective Sora prompts clearly specify subject, action, environment, camera style (e.g., close-up, wide shot, tracking), lighting, and mood in one concise description. Mention video traits like duration, resolution, cinematic style, and use iterative refinement—adjusting details after each generation—to steer Sora toward the desired look and motion.

References