At least one key feature of the HAL 9000.
In a blog post, the company reveals its new deep learning model works by using both the auditory and visual signals of an input video – in short, it lip reads. “The visual signal not only improves the speech separation quality significantly, in cases of mixed speech (compared to speech separation using audio alone, as we demonstrate in our paper)”, the post reads. “Importantly, it also associates the separated, clean speech tracks with the visible speakers in the video.”
Google demonstrates its new AI model using a series of videos including one of two stand-up comedians talking loudly at the same time (which you can watch below), and its effectiveness is startling. It can pick out either man’s voice without any problems, and the speech is so clear there’s no clue someone else was even speaking on the original recording.
http://www.alphr.com/google/1009054/google-ai-speech-separation
No comments:
Post a Comment
Due to a small but consistent influx of spam, comments will now be checked before publishing. Only egregious spam/illegal/racist crap will be disapproved, everything else will be published.