Kling O1 is supposedly the world's first "unified" multimodal video model.
But unlike previous versions that focused mostly on visual quality, the O1 model is all about giving creators deep control over their shots.
Built on the concept of Multi-modal Visual Language (MVL), it basically lets you "input everything." Text-to-video, image-to-video, and advanced editing are all rolled into one workflow.
The model also claims to nail physical consistency and narrative logic—meaning every frame should actually make sense in the context of the scene.
Plus, it supports native 2K resolution and promises to keep characters looking consistent throughout.
But how does it actually perform? Is it really as good as the hype suggests?
I’m going to dive into a deep dive based on my actual experience to see if Kling O1 truly delivers.
Kling O1: The Core Capabilities
Multi-Reference Video Generation: Seamlessly blend multiple reference images or elements into a single cohesive clip.
Text-Based Video Editing: Use simple text prompts to modify, add, or remove elements in your footage—no manual masking required.
Text-to-Video: Generate high-fidelity visuals from scratch using only natural language descriptions.
Start & End Frame Transitions: Create smooth, logical movement by defining exactly how a shot begins and ends.
Stylization & Video Expansion: Effortlessly swap art styles or extend the duration and camera path of your existing clips.
Kling O1 Core Features Review
Multi-Reference Video Generation
The Kling O1 model allows you to upload anywhere from 1 to 7 reference images or specific elements—like characters, objects, outfits, or scenes—all at once.
You just write your prompt and use the "@" symbol to tag those uploaded elements, and the AI blends them together into a single video.
For the best results, I recommend structuring your prompts like this: [Detailed description of elements] + [Interactions/Actions between elements] + [Environment/Background] + [Lighting/Style cues].
In terms of output, it supports clips up to 10 seconds in a 16:9 aspect ratio.
You can choose between "Standard" and "Professional" modes—essentially "Draft" vs. "Final Render."
Here is the video outcome:
Issues I ran into during testing:
- Proportional issues: In one clip, the man’s head looked unnervingly small compared to his body.
- Prompt adherence: After running several tests, I noticed that the model occasionally ignores parts of the prompt entirely.
Natural Language Video Editing
Another killer feature of Kling O1 is Semantic Editing.
It lets you edit generated videos using simple text commands, replacing tedious manual tasks like masking, rotoscoping, and frame-by-frame retouching.
By combining text prompts with reference images or elements, you can do it all in one go: add, modify, or delete subjects and backgrounds, swap styles, change the lighting, or even adjust the camera angle and composition.
We’re talking about everything from basic content removal and shot resizing to complex tasks like re-coloring, changing the weather, green-screen keying, and adding VFX—like adding fire, freezing the environment, or giving a character glowing red eyes.
In the past, doing this in software like After Effects meant manually setting keyframes and parameters for every single movement. It was an absolute time-sink.
Now, you can get it done with a single sentence. This hits the biggest pain point in traditional video editing:
Keeping the character consistent while keeping the motion natural.
The AI’s output is honestly on par with professional manual retouching.
But I also met with some problems when using the prompt edition.
Kling O1’s Prompt Editing covers a wide range of use cases:
1. Relighting and Atmosphere (e.g., changing day to dusk, or adding a cinematic look)
The Issue:
- I ran into an issue where a video seems to only allow one "swap." Once you change it, there’s no easy way to revert or layer another change on top.
2. Recomposing and Camera Angles:
You can take the content of a single video and re-imagine it from a different perspective—like flipping a front-facing shot into an aerial drone view.
3. Precise Object Replacement and Smart Removal:
The Issues:
-
The model sometimes struggles to keep the character in the video looking exactly like the reference image.
-
If there’s a camera cut in the original clip, the replaced object often fails to stay consistent across the transition.
4. Recolor and Style Transfer:
For example, turning a live-action clip into an anime-style animation.
The Issue:
- You can clearly see glitches in the eyes and other fine details.
5. Spatial/Temporal Expansion and Keyframe Control: This includes "Start Frame + End Frame" transitions. You can also customize the duration between 3 to 10 seconds and fine-tune the pacing and mood of the clip.
The Kling O1 Workflow: From Input to Final Export
The standard workflow is broken down into three simple steps:
-
STEP 1: "Input Anything" — Upload up to 7 reference images, a video clip, or just plain text.
-
STEP 2: "Write the Prompt" — Describe your scene and what you need in plain English.
-
STEP 3: "Generate with Kling" — Quickly render a high-fidelity video that allows for seamless layering and iterations.
Kling O1 (Use Cases)
Filmmaking: Locking in specific characters and props to generate consistent footage across multiple scenes.
Advertising: Uploading a product, model, or background to quickly churn out professional-looking shot
Fashion: Creating virtual runways and dynamic digital lookbooks.
Post-Production: Using natural language instead of tedious tracking and masking for pixel-level adjustments.
Testing Kling O1’s Grasp of Real-World Physics
I ran several tests to push the O1 model’s limits on physical motion, lighting, and overall realism. Here’s how it handled the challenge:
1. "Physics-Heavy Scenes" and Natural Dynamics The Goal: To see if objects follow gravity, inertia, and collision physics, rather than just "sliding" like a 2D image.
- Case: Complex Collisions and Deformation
Prompt: A high-speed close-up of a heavy bowling ball dropping into a large bowl filled with colorful jelly cubes. The jelly squashes and vibrates upon impact, with some cubes flying out. The lighting reflects off the wobbling jelly surfaces.
- Case: Liquid-Solid Interaction
Prompt: A cinematic shot of thick golden honey being poured over a rotating pile of blueberries on a glass plate. The honey should flow realistically, sticking to the berries and filling the gaps, while the camera orbits the plate.
2. Integrated Lighting, Camera, and Motion The Goal: To verify if light and shadows shift in real-time based on physical position, rather than looking like a static "overlay."
- Case: Dynamic Light Tracking
Prompt: A sleek chrome robot walking through a dark room with flickering neon signs. As the robot moves, the neon reflections on its curved metallic body shift and distort accurately according to its movement and the camera’s low-angle tracking shot.
- Case: Light Penetration (God Rays)
Prompt: A handheld camera following a person walking through a dense forest at sunset. Sunlight breaks through the swaying leaves, creating "god rays" and moving dappled shadows that realistically track across the person's face and clothes as they move.
3. High-Speed Motion: Solid and Natural The Goal: To see if extreme movement causes "jello effects" or object disintegration.
- Case: Extreme Drifting
Prompt: A low-angle FPV drone shot chasing a sports car drifting around a sharp corner on a wet asphalt road. Water splashes from the tires in a realistic arc. The camera shakes slightly, and the motion blur feels natural, with the car maintaining its solid geometry throughout the turn.
In physics-heavy scenes—like a bouncing ball or a drifting car—Kling O1 feels solid and natural.
The O1 model treats motion dynamics, lighting, and camera angles as a unified system rather than separate layers.
Even when you're swapping objects or restyling a scene, the natural flow of the video stays intact. For creators who have long struggled with the "jittery" mess of AI video, this is a massive leap forward.
Just a few months ago, this kind of movement would have been a flickering disaster full of artifacts. Now, object permanence and lighting consistency hold up beautifully across the entire clip.
The Reality Check: Kling O1’s Flaws and Controversies
Kling O1: The Good and the Bad
I’ve had a lot of fun testing Kling O1, but it’s far from perfect. If you look at all those amazing demos online, you’ll notice one thing: the camera never moves. The shots are all still. As soon as you try to edit a video with lots of movement or camera cuts, things start to get messy.
Problems with Mixing Elements Kling O1 lets you mix different images and ideas together like making a soup. But if you add too many "ingredients," the video starts looking bad.
Also, when there’s too much going on, the AI just stops listening to your instructions (the prompt).
Problems with Video Editing Kling O1 tries to let you change specific things in a video, but it has two big flaws:
1. You can’t do multiple edits at once
You can't change the scene step-by-step in one go. For example, if I change the weather to night, I can’t tell it to turn back to morning a few seconds later. You have to do one edit at a time, which is a bit of a pain.
2. It breaks when the camera moves
If your video has fast dancing or lots of quick camera cuts—like a K-pop music video—the AI fails. When I tried to turn a dance video into an anime style, the whole thing just fell apart because the movement was too fast.
It basically only works for "still" shots.
3. Blurry shots and weird glitches
Sometimes the video just comes out blurry, or you see weird "ghost" marks (artifacts) after you change the style.
My Honest Opinion
This is the biggest weakness of Kling O1 right now. If you look at their official videos, the camera is always staying still. This tells me that Kling O1 only works well when the camera doesn't move.
Start & End Frame Transitions
Kling O1 is supposed to be great at filling in the gaps between two images. They say it can create super smooth animations while keeping the characters and lighting exactly the same—even for 10-second clips.
But in real-world testing, here’s what I actually found:
The video shows it doesn’t have a logical transition between the first and last frame.
Super Strict Censorship
A lot of users are complaining that Kling O1 has become way too strict. Even if you aren't trying to make anything "spicy" or inappropriate, the AI often blocks your prompts for no clear reason. It feels like it's flagging things way more than it used to.
Will Kling O1 actually kill the movie industry?
I think people are worrying way too much about this. Some say Kling O1 will replace Hollywood special effects, but that’s a huge exaggeration.
Sure, it looks okay when the camera is standing still. But as soon as you have fast-moving action or quick camera cuts, the video gets blurry and full of mistakes. It’s honestly unusable for professional work right now. We are still a long way off from this actually "disrupting" the movie industry.
Kling O1: Pricing
Kling O1 uses a "Credit" system. Since it’s their top-tier model, it’s much more expensive to run than the standard versions.
Subscription Plans
If you pay yearly, you can save about 34%. Here’s the monthly breakdown:
| Plan | Price (USD/mo) | Monthly Credits | Key Perks |
|---|---|---|---|
| Standard | ~$10 | 660 | 1080p, no watermark |
| Pro | ~$37 | 3,000 | 4K, faster rendering |
| Premier | ~$92 | 8,000 | Priority access to O1 |
| Ultra | ~$180 | 26,000 | For pros and businesses |
What does it cost to generate?
Because O1 is doing complex editing, the "price per click" is high:
-
Photo Editing: ~5 credits per edit.
-
Image-to-Video (5s): ~90 credits. (10s costs double: 180 credits).
-
Video-to-Video (5s): ~140 credits. (10s costs 280 credits).
-
AI Semantic Editing: About 280 credits per clip.
Free vs. Add-ons:
-
Daily Freebies: You get 66 free credits a day, but videos are only 720p and have a watermark.
-
Add-on Packs: If you run out, you can buy more (starting at $5 for 330 credits). These usually last longer than subscription credits, which expire every month.
What’s next?
In the future, I’d love to see Kling O1 automatically detect objects in a video. Imagine just clicking an object and using the "@" symbol to edit it directly in the prompt. That would give us the "deep control" we’re actually looking for.
The Final Verdict
Kling O1’s ability to mix elements and edit video with text is impressive—but only for specific types of shots. It really only shines with static, fixed camera angles. Even then, you’ll still see glitches and ignored prompts.
The moment there’s movement or camera cuts, the quality takes a nosedive. That’s why almost every "amazing" review you see online only shows still shots.
We are still a long way from this being useful for professional, industry-level work—unless you’re willing to generate every single shot one by one and edit them all manually.
To be honest, it feels more like a gimmick than a revolution. Given how expensive it is, my advice is to give the free trial a spin, but don't waste your hard-earned money on a subscription yet.