Facebook has announced Ego4D, a long-term machine learning project that will allow artificial intelligence to capture the surrounding environment from an almost human perspective. Or rather self-centered, as the official name of the project suggests.
In short, the artificial intelligence of the future will see the world from a human perspective, learning from our interaction with the environments in which we live and operate.
It would be a Copernican revolution. Because so far most of the software that allows machines to explore the surrounding environment (ie computer vision, a model of the real world created starting from two-dimensional images) derives information from images and videos taken from a third-person perspective.
Let’s see how Facebook’s project, Ego4D, will differ from this perspective.
Facebook and the Ego4D project
Facebook’s announcement is on Thursday 14 October: Ego 4D is the project of a new way of understanding artificial intelligence.
The ambition is to train the assistants and robots of the future with a large set of data, but this time by adopting the self-centered perception.
How is this actually possible?
For the Ego4D project, Facebook collected more than 2,200 hours of first-person footage, precisely to educate the next generation artificial intelligence.
The videos include, for example, videos that come from smart glasses, such as the Ray-Ban Stories put on the market last September by Facebook, or from virtual reality viewers (and also here the company of Zuckerberg, owner of Oculus since 2014, play at home).
The near future: smartglass as a smartphone
The idea of the Ego4D project comes from Facebook’s belief that in the near future smartglasses, and all devices for augmented and virtual reality, will be as widespread as smartphones are today.
The company therefore aims to create artificial intelligence software that knows how to monitor the surrounding context as humans do.
A post published on the Menlo Park company’s blog on the day of the project’s announcement, Thursday, October 14, clarifies some details.
The post states that “the new generation artificial intelligence will have to learn from videos that show the world from the point of view of users, from the exact center of the action”. But above all, five “reference challenges” are launched for the future of AI assistants. Let’s find out.
Facebook’s five challenges for the future of AI
The new artificial intelligence based on self-centered perception will have to be able to answer the questions in five suggestive areas. Here are which ones:
- Episodic Memory: What Happened When? (for example, “Where did I leave the keys?”)
- Forecast: What am I likely to do next? (for example, “Wait, you’ve already added salt to this recipe”)
- Handling of hands and objects: what am I doing? (for example, “Teach me to play the drums”)
- Audiovisual Diarization: Who Said What When? (for example, “What was the main topic during the lesson?”)
- Social interaction: who interacts with whom? (for example, “Help me hear better the person talking to me in this noisy restaurant”).
For the Ego4D project, Facebook has set up a consortium of 13 universities (including that of Catania) and laboratories in nine countries. The more than 2,200 hours of first-person video involved more than 700 participants.
The data was then integrated with an additional 400 hours of first-person video recorded by Facebook Reality Labs Research, thanks to volunteers in action in stage environments that simulated real life.
The available dataset is twenty times larger than any other for hours of footage.
Green light to researchers
Facebook stated that the Ego4D dataset will be available to researchers starting in November. Kristen Grauman, lead researcher at Facebook, said: “This initial version of the project contains a series of data to be processed. We will catalyze our progress in the academic community and this will allow other researchers to overcome new challenges with us ”.
Grauman then explained the future usefulness of the project. “This dataset will enable artificial intelligence systems programmers to develop digital helpers who are truly aware of what is going on around them and what they need to do. The goal is to make learning autonomous so that each helper learns from his surroundings thanks to the data he has been trained on ”.
A bill of rights for AI
In the meantime, White House science advisors have called for a bill of rights for artificial intelligence to be drawn up, so that the collected data sets are non-discriminatory and do not contribute to increasing social inequalities.