In the summer of 2019 we all indulged ourselves with FaceApp. Each of us has suddenly discovered what our own faces and those of relatives and friends could look like when they were elderly. Or how our features could have been when we were very young.
From then on, it was a proliferation of apps that have allowed us to play in a more or less realistic way with our faces and with images in general, creating a sort of alternative and grotesque reality. For example, it was the turn of Evil be like, where more than anything else we were having fun with the combination of a negative of a frame and a caption. It then fell to Yassify, thanks to which its young creator spread memes with overly embellished, gaudy faces.
In short: in all these cases they are tools that start from a real image, and play to deform it with satirical intent.
Something different and more extreme makes DALL-E 2, a project with a not too friendly name. That was born from OpenAI, a non-profit organization on artificial intelligence founded by Elon Musk and Sam Altman. But what is DALL-E 2, and how does it work?
What is DALL-E 2
Why did we say that DALL-E 2 is something more extreme than FaceApp and similar?
Because with DALL-E 2 the creation of images no longer starts from photographs but from… words. In what sense? In the sense that the images shown, in this case, do not exist in reality, but are generated by artificial intelligence. And they are incredibly realistic.
And this is only one of the two peculiarities of DALL-E 2. The other is given by the fact that the images created start from words suggested to AI by human beings.
Free rein to the imagination
The official website briefly explains what DALL-E 2 is. “It is a new artificial intelligence system capable of creating realistic images and works of art from a description in natural language.”
Do we want to imagine, and therefore suggest to artificial intelligence, an astronaut who lands on the lunar surface, but riding a horse? Very well: on the project website several frames on the subject are shown.
The muzzle of a little dog staring at the camera then circulated. All normal, if he weren’t wearing a turtleneck and a painter’s hat.
How DALL-E works 2
The artificial intelligence of DALL-E 2 therefore processes a request made in writing.
There is also an artistic side, so to speak: the system, starting for example from the representation of a work of art, is able to create infinite variations. As is always shown on the official website starting from Vermeer’s Girl with a Pearl Earring.
The scientific research behind DALL-E 2 is based on two skills. Meanwhile, the one for which artificial intelligence understands the relationship between suggested words and images. To do this, OpenAI has developed the Clip system (Contrastive Learning-Image Pre-training), which trains two neural networks in parallel on images and related captions taken from the Net. Then there is the moment of creating the images, through the technique of diffusion. That is, images are produced that have the greatest possible relationship with the text to be processed, and that are coherent and understandable for the human eye.
A second updated version
DALL-E 2 is the most advanced version of DALL-E, released in January 2021. The resolution has improved (it goes from 256 x 256 pixels to 1024 x 1024 pixels) and the times of realization of the images.
DALL-E 2 is not yet available as it is still under development. However, as stated on the official website, the tool wants to spread the mission of OpenAI. Which is “to create artificial intelligence for the benefit of humanity.”
The idea, as we told you in another article, would already have a competitor. In fact, it seems that even Mark Zuckerberg is thinking of an artificial intelligence capable of creating virtual worlds in the metaverse, based on simple descriptions.
The risks of the operation
DALL-E 2 is not yet available and some are already talking about at least two types of risks.
The first is linked to the extreme difficulty of distinguishing real images from those created by AI (with the additional and implicit risk of appropriating photos of people).
The second is that even this artificial intelligence tool would perpetrate a discriminatory culture. In an article that appeared in Vox, journalist Sigal Samuel points out that, for example, the images produced by suggesting the words “lawyer” and “flight attendant” are, relatively, only of white men and white women.