Decoherency: This machine can see your lips move

At least one key feature of the HAL 9000.

In a blog post, the company reveals its new deep learning model works by using both the auditory and visual signals of an input video – in short, it lip reads. “The visual signal not only improves the speech separation quality significantly, in cases of mixed speech (compared to speech separation using audio alone, as we demonstrate in our paper)”, the post reads. “Importantly, it also associates the separated, clean speech tracks with the visible speakers in the video.”

Google demonstrates its new AI model using a series of videos including one of two stand-up comedians talking loudly at the same time (which you can watch below), and its effectiveness is startling. It can pick out either man’s voice without any problems, and the speech is so clear there’s no clue someone else was even speaking on the original recording.
http://www.alphr.com/google/1009054/google-ai-speech-separation

Decoherency

Friday, 13 April 2018

This machine can see your lips move

No comments:

Post a Comment