In recent months it has been difficult to keep up with the tech giants that produce new artificial intelligence software or put new and more competitive models on the market.
In this area, Google is among the most diligent companies. In the summer, the first rumors spread around the Gemini software, which was said to have already passed the testing phase.
Google Gemini was then presented last December in its three versions Nano, Pro and Ultra. And at the beginning of February 2024, as happened with the transition (and rebranding) from Twitter to X, we had to forget about Bard: the name of the Mountain View company’s artificial intelligence would be Gemini.
And now Google has already announced the new version of the software, Gemini 1.5. What do we know?
Gemini 1.5
The announcement of Gemini 1.5 appeared in a long post published on the Google website on Thursday 15 February.
The note contains interventions by two top figures: CEO Sundar Pichai and Google DeepMind CEO Demis Hassabis.
We read that the improvements compared to the previous version appear notable. Compared to 1.0 Pro (intermediate version, between Nano and Ultra), Gemini 1.5 Pro provides an 87% improvement in test, code, image, audio and video processing. Performance is close to that of Gemini 1.0 Ultra.
Gemini 1.5 in detail
Gemini 1.5 is based on the new Mixture-of-Experts (MoE) architecture, which improves the efficiency of the software during the training phase.
Not only: MoE allows selective activation of models. In concrete terms: when a certain input is sent to the model, the processing activates only part of it, reducing consumption and above all waiting times.
The context window in which Gemini 1.5 Pro operates is 128,000 tokensfour times as much as that of Gemini 1.0, which was 32,000.
Developers and cloud customers (who, as we will see, can already access Gemini 1.5) have a version with a context window of one million tokens available. And Google announced that it had tested a context window with as many as 10 million tokens.
Let us remember that, to simplify, the token is the “weight” of a word, of a punctuation mark or of a space between two words. It is, we could say, the linguistic unit of measurement of the Large Language Model.
The larger the context window, the greater the software’s ability to process in response to each individual user command.
The performances
Let’s give some examples. Gemini 1.5 Pro, through a single user request, can process one hour of video, 11 hours of audio, over 30,000 lines of code and more than 700,000 words.
With a single prompt, Gemini 1.5 can analyze and summarize a 402-page document from the Apollo 11 mission, or describe the 44-minute film Sherlock Jr. by Buster Keaton (known in Italy as Ball No. 13).
When will it be available
The private preview of Gemini 1.5 is accessible to developers via AI Studio and to cloud customers via Vertex AI.
In the future (the dates have not even been leaked) various types of subscriptions will be introduced for private customers. It is not known when versions of software with multi-million token context windows will be made public.
Comments
Google CEO Sundar Pichai commented on the announcement of Gemini 1.5: “Gemini 1.5. It shows notable improvements across a range of dimensions, and 1.5 Pro achieves comparable quality to 1.0 Ultra, while using fewer compute resources.”
And again: “Longer context windows show us the promise of what is possible. They will enable completely new features and help developers create much more useful models and applications. We are excited to offer a limited preview of this experimental feature to developers and enterprise customers.”
Demis Hassabis. CEO at Google DeepMind, added: “The first Gemini 1.5 model we will release for early testing is the Gemini 1.5 Pro. It is a mid-sized multi-mode model, optimized to suit a wide range of activities and performs at a similar level to the 1.0 Ultra, the our largest model to date. It also introduces a revolutionary experimental feature in understanding long context.
As we implement the full 1 million token context window, we are actively working on optimizations to improve latency, reduce computational requirements, and improve user experience.”
Leave a Reply
View Comments