These are times of great excitement for artificial intelligence.
Google has just announced its new Gemini 1.5 model, much more powerful than the previous version (and in the testing phase the Mountain View company is testing one with far greater performance). In another article we told you how OpenAI is working on a new AI-based search engine.
And Sam Altman’s company has come out with some big news. OpenAI has launched Sora, artificial intelligence capable of producing realistic (extremely realistic) videos starting from a textual command. What do we know about the tool?
OpenAI announces Sora
A few hours after the announcement of Gemini 1.5 on the Google blog, it’s OpenAI’s turn, which launches Sora. And he lets the world know through a long post (rather similar to a section of the site), which contains a series of truly suggestive videos.
Sora’s presentation is also grandiloquent: “We are teaching artificial intelligence to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.
Introducing Sora, our text-to-video template. Sora can generate videos lasting up to one minute while maintaining great visual quality and maximum adherence to the user’s request.”
An example
OpenAI’s Sora therefore generates videos lasting up to one minute starting from a text.
The section of the site dedicated to the tool shows several example videos, each accompanied by the text prompt used to produce it. And the result is undoubtedly realistic. An example: we see a young woman walking at night in Tokyo. She is first framed full-length, then in close-up. Here, translated into Italian, is the text command that generated the video: “An elegant woman walks down a Tokyo street filled with warm, glowing neon and animated city signage. She is wearing a black leather jacket, a long red dress and black boots and carries a black bag. She is wearing sunglasses and red lipstick. She walks with confidence and ease. The street is damp and reflects the figures, creating a mirror effect of the colored lights. Lots of pedestrians walking.”
Sora’s realistic videos
The videos shown so far by OpenAI indicate Sora as the most powerful video generator software from text commands.
As can be seen from the prompt just mentioned, the company explains that “Sora is capable of generating complex scenes with multiple characters, specific types of movement and accurate details of subjects and backgrounds. The model includes not only what the user asked for in the prompt, but also how the different elements coexist in the real world.”
It is not known when OpenAI will make Sora available to the public. At the moment, Sam Altman has launched a forward-thinking initiative from a promotional point of view. Through his X account he asked users to send him proposed text prompts that they would like to see transformed into videos. Periodically, the OpenAI CEO will publish them on his profile.
She has already started doing so, and other videos include, for example, “an educational cooking session for homemade gnocchi led by a social media influencer grandmother, set in a rustic Tuscan kitchen with cinematic lighting.”
Quality and limits
The quality of Sora’s videos raises the bar a lot compared to what we’ve seen so far. The deepfake images of characters with six fingers on their hands seem prehistoric. Nevertheless, OpenAI stated that “the current model has weaknesses. May have difficulty accurately simulating the movements of a complex scene and may not understand specific instances of cause and effect. For example, a person might take a bite of a cookie, but the cookie might not leave a bite mark afterwards.
The model may also confuse the spatial details of a prompt, such as confusing left and right, and may have difficulty with precise descriptions of events that occur over time, such as following a specific camera trajectory.”
The risks
Many are thinking about a possible cinematic use of Sora. But there are even more people who are wondering whether such an extraordinary qualitative rendering of the videos will make it increasingly difficult to distinguish the real from the fake.. Deepfakes and misinformation, with the boom in generative AI, are increasingly urgent and intricate problems.
In this sense, on the page dedicated to Sora here is what OpenAI says: “We will involve politicians, educators and artists from around the world to understand their concerns and identify positive use cases for this new technology.
Despite extensive research and testing, we cannot predict all the positive ways people will use our technology, nor all the ways they will abuse it. That’s why we believe that learning from real-world use is a critical component to building and releasing increasingly safe AI systems over time.”
Leave a Reply
View Comments