The Social Network has today outlined a new machine learning process called ‘Anticipative Video Transformer (AVT)’, which is able to predict future actions in a process based on visual interpretation.
As you can see in this example, the new process is able to analyze an activity, then anticipate what action is likely to come next as a result.
Which could have a range of applications – as explained by Facebook:
“AVT could be especially useful for applications such as an AR “action coach” or an AI assistant, by prompting someone that they may be about to make a mistake in completing a task or by reacting ahead of time with a helpful prompt for the next step in a task. For example, AVT could warn someone that the pan they’re about to pick up is hot, based on the person’s previous interactions with the pan.”
That sounds like something straight out of a sci-fi movie, facilitating all new smart home applications. And again, in the context of AR glasses, that could provide a range of useful pointers to help guide people, at home or at work, in undertaking a wide variety of tasks.
“We train the model to predict future actions and features using three losses. First, we classify the features in the last frame of a video clip in order to predict labeled future action; second, we regress the intermediate frame feature to the features of the succeeding frames, which trains the model to predict what comes next; third, we train the model to classify intermediate actions. We’ve shown that by jointly optimizing the three losses, our model predicts future actions 10 percent to 30 percent better than models trained only with bidirectional attention.”
It’s not something that Facebook’s looking to roll out right away, but the potential here is significant, and it could eventually facilitate all new ways of guiding user actions, and minimizing mistakes by anticipating future steps.
Facebook uses the example of changing a car tire, with AR glasses helping to point you in the right direction, while it might also serve as a reminder for your morning routines, based on visually assessing where you are and what you’re doing.
Really, the potential applications here are endless, and when you also consider how Google Glass evolved to become a key tool in industrial workplaces, by providing in-view pointers and instructions for technical applications, the added potential for Facebook’s wearable AR devices is significant.
It’s some way off being a consumer-facing product, in any form, but the project underlines Facebook’s ongoing AI development, and points to the evolving functionality that’ll likely be built into a coming stage of its AR glasses projects.
You can read more about Facebook’s Anticipative Video Transformer (AVT) process here.