LogoMakeClipAI
  • Home
  • Pricing
  • Showcase
  • Trending
  • Blog
  • Docs
Gemini Omni Video: Google's multimodal video model lands on MakeClipAI
2026/05/22

Gemini Omni Video: Google's multimodal video model lands on MakeClipAI

Google's Gemini Omni Video is now available on MakeClipAI. It's the first multimodal model on the platform — accepts images, video clips, character references, and audio all in one request.

A couple of weeks ago, I wrote about how choosing the right model depends on what stage your video is at. The models I covered — Kling, Seedance, Veo 3, Hailuo — all work roughly the same way: give them a prompt, maybe a reference image, and they generate a clip.

That's about to change.

Google's Gemini Omni Video just landed on MakeClipAI via the kie.ai marketplace, and it's the first model on the platform that genuinely thinks in multiple modalities at once. You're not just text-to-video anymore. You can feed it images, video clips, character IDs, and audio — all in the same request — and it weaves them into a coherent output.

I've been testing it for a few days. Here's what it actually changes about how I think about AI video prompts.

What makes "Omni" different

Most AI video models treat your prompt as a description. You write "a futuristic city at night with neon lights," and the model interprets that and generates something from scratch.

Gemini Omni doesn't work that way. It's trained to fuse multiple inputs simultaneously:

  • Text prompt: The core description, same as any model
  • Image URLs (up to 7): Reference images for character appearance, scene style, or storyboard frames
  • Video clips (up to 1, ≤30s): A source video to remix, extend, or restyle
  • Character IDs (up to 3): Character references from the gemini-omni-character API — keep a character consistent across generations
  • Audio IDs (up to 3): Narration, dialogue, or sound design generated via gemini-omni-audio

The key difference: it can compose all of these together. An image reference for the character + a video clip for the background motion + an audio track for narration + a text prompt for the overall mood. That's not something the previous generation of models could do in a single pass.

The quota system is worth understanding

Because the model processes multiple inputs at once, the API uses a simple quota system. Think of it as having 7 slots:

  • Each image consumes 1 slot
  • Each video consumes 2 slots
  • Each character ID consumes 1 slot

Formula: (Images × 1) + (Videos × 2) + (Character IDs × 1) ≤ 7

Practically this means:

  • 7 images and nothing else
  • 1 video + 3 character IDs + 2 images
  • 5 images + 2 character IDs
  • Or any other combination that fits within 7

This is actually pretty generous. Most use cases won't need more than 1-2 images anyway.

Where it shines

Character consistency is the biggest win. If you've used other AI video models, you know the pain of getting the "same" character to look the same across multiple shots. With Gemini Omni, you can pass a character reference via the character API, and it respects that reference across generations. This is huge for narrative work — multi-scene storytelling where the protagonist needs to be recognizably the same person.

Style transfer from video is another impressive use case. Feed it a 10-second clip of the visual style you want (specific lighting, camera movement, color grading), and it can generate new content that matches that style. The source video doesn't need to be high production value — even rough phone footage works as a reference.

Audio-guided generation is still early, but promising. You can generate dialogue or narration via the gemini-omni-audio endpoint and pass it in as an audio ID. The video output will sync reasonably well to the audio, which saves a lot of post-production lip-sync or voiceover alignment work.

Where it's not the best fit

Let me be honest about the tradeoffs.

If you're just doing simple text-to-video — "a cat playing piano" — Gemini Omni is overkill. You're paying for multimodal processing you don't use. Models like Seedance 1.5 or Kling 2.6 handle simple prompts faster and cheaper.

The same goes for rapid ad testing. If you're trying to churn through 20 hook variations in an afternoon, the quota system adds friction. You're better off iterating on Seedance or Kling and using Gemini Omni only for the final polished version.

Duration is also limited. The maximum output is 10 seconds. For longer scenes, you'll still want the Director mode with Seedance 1.5 or multi-scene Kling 3.0.

What this means for MakeClipAI users

Gemini Omni Video is available now in the model picker. You'll find it alongside Veo 3, Kling 3.0, Seedance, and Hailuo — same one-click generation workflow.

The pricing is comparable to premium models:

DurationCredits
4s65
6s90
8s115
10s140

My recommendation: use it when you need multimodal inputs (character refs + audio + video). For standard text-to-video, stay on Seedance or Kling. Think of Gemini Omni as your "compose" model — the one you reach for when a single prompt and one reference image aren't enough.

Related reading

  • Don't use the expensive model. Not yet.
  • What I learned about picking AI video models after 200+ generations
  • Stop rewriting AI video prompts from scratch every time
  • How to Choose AI Video Models and Manage Credits
All Posts

Author

avatar for MakeClipAI
MakeClipAI

Categories

  • News
  • Product
What makes "Omni" differentThe quota system is worth understandingWhere it shinesWhere it's not the best fitWhat this means for MakeClipAI usersRelated reading

More Posts

What I learned about picking AI video models after 200+ generations
NewsProduct
Featured guideBeginner-friendly

What I learned about picking AI video models after 200+ generations

A practical guide to choosing between cheap, mid-tier, and premium AI video models for ads, demos, and social clips — based on real usage, not specs.

avatar for MakeClipAI
MakeClipAI
2026/03/08
Read guide →
From prompt to video: my complete AI video workflow for Instagram
Product
Practical guideBeginner-friendly

From prompt to video: my complete AI video workflow for Instagram

I make 5 AI videos for Instagram every week. Here's my complete workflow from blank page to published Reel — including the prompts I use.

avatar for MakeClipAI
MakeClipAI
2026/05/14
Read guide →
AI video for social media: what actually works for engagement in 2025
Product
Practical guideBeginner-friendly

AI video for social media: what actually works for engagement in 2025

I tested 6 different AI video styles across TikTok, Instagram, and YouTube Shorts. Here's what got views, what got ignored, and why.

avatar for MakeClipAI
MakeClipAI
2026/05/14
Read guide →
LogoMakeClipAI

Create AI videos for products, ads, and social media in one simple workflow.

Email

Stay in the loop

Get product updates, new model launches, and workflow drops from MakeClipAI.

Product
  • Pricing
  • Showcase
  • Prompt Guide
Resources
  • Blog
  • Docs
  • Changelog
Company
  • About
  • Contact
Legal
  • Cookie Policy
  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Acceptable Use Policy
  • DMCA Policy
© 2026 MakeClipAI All Rights Reserved.