At least one key feature of the HAL 9000.
In a blog post, the company reveals its new deep learning model works by using both the auditory and visual signals of an input video – in short, it lip reads. “The visual signal not only improves the speech separation quality significantly, in cases of mixed speech (compared to speech separation using audio alone, as we demonstrate in our paper)”, the post reads. “Importantly, it also associates the separated, clean speech tracks with the visible speakers in the video.”
Google demonstrates its new AI model using a series of videos including one of two stand-up comedians talking loudly at the same time (which you can watch below), and its effectiveness is startling. It can pick out either man’s voice without any problems, and the speech is so clear there’s no clue someone else was even speaking on the original recording.
http://www.alphr.com/google/1009054/google-ai-speech-separation
Sister blog of Physicists of the Caribbean in which I babble about non-astronomy stuff, because everyone needs a hobby
Subscribe to:
Post Comments (Atom)
Review : Pagan Britain
Having read a good chunk of the original stories, I turn away slightly from mythological themes and back to something more academical : the ...
-
"To claim that you are being discriminated against because you have lost your right to discriminate against others shows a gross lack o...
-
I've noticed that some people care deeply about the truth, but come up with batshit crazy statements. And I've caught myself rationa...
-
For all that I know the Universe is under no obligation to make intuitive sense, I still don't like quantum mechanics. Just because some...
No comments:
Post a Comment
Due to a small but consistent influx of spam, comments will now be checked before publishing. Only egregious spam/illegal/racist crap will be disapproved, everything else will be published.