AI Styling Studio — Infinite avatar looks from just 1 photo.Try it now.

BestAITools

Submit your Tool

8000+ AI tools already listed
8K+Tools
100K+/moViews
25K+/moVisitors

AI NewsGoogle’s Gemini Omni turns images, audio, and text into video — and that’s just the start

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

2:51 AM IST · May 20, 2026

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

When Google launchedGemini three years ago, the goal was to build a multimodal large language model — a single neural network that was trained on text, image, audio, and video and could generate content in any of those formats. Today, at its Google I/O developer conference, the company took a concrete step toward that goal with Gemini Omni, a new family of multimodal models that Google CEO Sundar Pichai says will be able to “create anything from any input.” Omni will start with video. Users can now combine images, audio, video, and text, and rather than simply stitching those inputs together, Omni reasons across all of them to produce a consistent output. The result is high-quality videos that reflect an understanding of physics, culture, history, and science. Omni also lets users edit photos with plain text commands rather than complex editing software, similar toGoogle’s Nano Banana. Google already has a dedicated video model,Veo, that lets users turn text and images into videos, and evendirect and customize avatars. But Google DeepMind director of product management Nicole Brichtova says that today’s release is more than a Veo update: “It’s the next step towards the progression of combining the intelligence of Gemini with the rendering capabilities of our media models.” One example that Koray Kavukcuoglu, DeepMind’s chief technologist, gave reporters during a media briefing on Monday: When Omni was given a simple prompt like “a claymation explainer of protein folding,” it quickly rendered a video of a stop-motion explainer with a voice-over that said, “Proteins start as chains of amino acids. They fold into patterns like the alpha helix and flat sections called beta sheets, forming a perfect three-dimensional shape.” The long-term vision for Omni is broader, involving the model being used to do things like generate images from audio, or audio from video. “When we first announced Gemini, it was our first AI model to be natively multimodal,” Pichai said during the briefing. “We knew that training it on a combination of text, code, audio, images, and video would give it a deeper understanding of the world. With world models, AI is moving from predicting text to simulating reality. Gemini Omni is the next step in that direction.” As part of the release, users will also be able to create videos with their own digital avatars — something OpenAI popularized on its now-defunct Sora app with Cameos. To prevent deepfakes, users will have to go through a dedicated product onboarding, which involves recording themselves and speaking out a series of numbers, per Brichtova. The avatar then gets stored for future use. Additionally, all videos created with Omni will include Google’s SynthID digital watermark, which allows users to verify if videos were generated via the Gemini products. The first model in the family is Gemini Omni Flash, which will roll out today to the Gemini app, YouTube Shorts, and AI creative studio Flow. Flash will be capable of rendering 10 seconds of video, which Brichtova says isn’t a model limitation, but rather a decision based both on a desire to get it into more hands and an anticipation that most users won’t want to make much longer videos yet. Longer video durations are in the pipeline for the near future, though. Google seems to be pitching Omni Flash as more of a consumer tool. The examples Brichtova and Gabe Barth-Maron, a research engineer at DeepMind, gave on a call with TechCrunch of uses for digital avatars were all personal: Making a video of yourself winning an award or going to the moon, or removing a passerby from the background of a video you took on vacation. Barth-Maron put it more simply: “They’re like personalized memes.” “We definitely did focus on making this easy to use for consumers,” Brichtova said. “Not many video models have breached that chasm with consumers, so this is our play to do that.” The ease of use comes with a caveat: Brichtova and Barth-Maron noted that editing prompts will need to be highly specific, otherwise Omni risks over-editing or unintentionally altering elements the user wanted to keep — a problem Nano Banana users would have run into. Despite the near-term consumer focus, Omni’s enterprise andcreative implicationsare obvious, and Google will make Omni available via API in the coming weeks. The avatar-generating tool — a capability that is available today on Shorts — is something Google expects content creators to pick up. But more broadly, an end-to-end multimodal workflow could be transformative for advertisers and filmmakers. Startup Luma AI is building something similar,an agentic toolthat can generate an entire ad campaign based on a short brief and a product image, powered by its own “unified” model. “We’re actually pretty proud of the model’s text-rendering capabilities, which is really useful for things like advertising,” Brichtova said. “If you want a product somewhere, or even just a slogan, it needs to be accurate … We definitely anticipate filmmakers and other kinds of creators are going to be using this model as well.” The more professional use cases might be better served by the Omni Pro model, which should perform better across all Omni tasks. Google hasn’t said when it will release Pro yet, but Brichtova said that will happen when “we feel like we’re at a point where we have a step change above Flash.”

read more

Latest AI News

View All News →
Rocket engine startup Impulse raises $500 million to hire people, not AI

Rocket engine startup Impulse raises $500 million to hire people, not AI

Impulse Space, a startup founded by SpaceX engine guru Tom Mueller to build highly-maneuverable spacecraft, announced a $500 million Series D this week that it will use to hire as many as 200 new employees. The round, led by 137 Ventures and BANNER VC, with participation from Founders Fund, Lux Capital, and Linse Capital, reflects investor interest in space and defense tech as the U.S. government hurls cash at national security problems and SpaceX gears up for its IPO. Impulse is focused on in-space mobility. The company has developed a highly maneuverable platform called Mira that is targeted at U.S. Space Force buyers. It’s also building Helios, a vehicle designed to carry satellites rapidly to high orbits after they are dropped off in space closer to Earth. President and COO Eric Romo told TechCrunch that the new capital will help the company build and test more space vehicles and emphasized the company’s hiring plans at a time when aerospace talent is in high demand. While the company’s software teams are adopting AI coding tools, Romo said that when it comes to solving engineering problems in the real world, deep learning models aren’t quite ready for prime time. As the 13th employee at SpaceX back in 2003, Romo’s job was creating computer simulations of the company’s engine design to assess its performance. “I considered it success if I got within 20% of the right answer, because the simulations were just not that good,” Romo said. “They’ve improved, but they’ve not improved that much, and so there’s not really any substitute for designing the thing, analyzing the thing, building it, and then getting it on the test stand.” Romo suspects AI tools for hardware design may be slower to arrive because the right training data is hard to find, compared to the amount of text and code available on the internet to train LLMs. “If you want to go, say, find the best designs for a turbo pump seal package in the world, you’re not going to find those online,” he points out. Impulse started with a focus on propulsion and evolved to build spacecraft, requiring the company to add more expertise in the form of engineers who build vehicle structures and flight computers. One reason the company recently opened an office in Colorado is that aerospace talent has more options today — instead of just going to Los Angeles, engineers can find work in Seattle, Denver, or Texas. Next up for the company is another launch of its Mira spacecraft, which made its third flight late last year. That flight wasn’t without incident — a problem with its navigation system led it to expend much of its propellant early on. Romo said the company is prepping a new Mira mission that is expected to launch before the end of the year.

3 hours ago

View

ZeroDrift raises $10M to protect AI models from themselves

ZeroDrift raises $10M to protect AI models from themselves

As enterprises troubleshoot their AI systems, governance has emerged as a key challenge. Some are taking a dual approach: One model to handle incoming queries, and another to keep the first one from getting into trouble. That’s the premise ofZeroDrift, a new AI compliance service that on Tuesday said it had raised $10 million in a seed funding round that saw investments from a16z Speedrun, Reign Ventures, PitchDrive Ventures, and U&I Ventures, among others. The company deals entirely with the second part of the system, sitting between AI models and end users to flag and replace any messages that might present a compliance problem. It might seem strange to build an AI tool to correct other AI systems’ mistakes, but ZeroDrift says its system has a few architectural advantages over the models it will be correcting. The system is triggered by conventional programs that deterministically apply known compliance standards like SOC 2 or GDPR, and the LLM only comes into play once a message has been flagged, rewriting a compliant version of the same message. “We’re able to identify, deterministically, what are all the regulated areas, what’s the violation that’s being broken, and then we have LLMs that can do the rewrites,” CEO Kumesh Aroomoogan says. Critically, the company says its entire system can be run with lower latency and more reliability than a conventional LLM. This is what ZeroDrift touts as its primary advantage over big labs like OpenAI and Anthropic, which are often already present in the underlying system. The most obvious use case is for AI chatbots, which are already deployed in front of consumers where there can be serious consequences for rogue answers. But Aroomoogan sees a much larger total addressable market, potentially spanning AI-generated messages that are generated only within automated systems that humans will never see. So far, it’s a relatively small market, but it’s one that will grow as AI proliferates. If the fundraise is any indication, there’s a lot of pent-up demand for such products. “It was probably the fastest fundraising I’ve done in my life,” Aroomoogan says, crediting Andressen Horowitz for helping structure the seed round. “We closed within three weeks, and we will be oversubscribed by 3x on the amount.”

3 hours ago

View

Cricketer KL Rahul Partners With str8bat to Launch AI-Powered Batting Platform

Cricketer KL Rahul Partners With str8bat to Launch AI-Powered Batting Platform

The partnership brings KL Rahul’s batting philosophy to str8bat’s AI platform allowing players to learn from professional-level insights tailored to their game.

3 hours ago

View

Coforge Launches Nexa Agentic Platform for Insurers

Coforge Launches Nexa Agentic Platform for Insurers

Nexa aims to automate and streamline underwriting, claims processing, product development and modernisation of legacy systems.

3 hours ago

View

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start