At least one key feature of the HAL 9000.
In a blog post, the company reveals its new deep learning model works by using both the auditory and visual signals of an input video – in short, it lip reads. “The visual signal not only improves the speech separation quality significantly, in cases of mixed speech (compared to speech separation using audio alone, as we demonstrate in our paper)”, the post reads. “Importantly, it also associates the separated, clean speech tracks with the visible speakers in the video.”
Google demonstrates its new AI model using a series of videos including one of two stand-up comedians talking loudly at the same time (which you can watch below), and its effectiveness is startling. It can pick out either man’s voice without any problems, and the speech is so clear there’s no clue someone else was even speaking on the original recording.
http://www.alphr.com/google/1009054/google-ai-speech-separation
Sister blog of Physicists of the Caribbean in which I babble about non-astronomy stuff, because everyone needs a hobby
Subscribe to:
Post Comments (Atom)
Whose cloud is it anyway ?
I really don't understand the most militant climate activists who are also opposed to geoengineering . Or rather, I think I understand t...
-
"To claim that you are being discriminated against because you have lost your right to discriminate against others shows a gross lack o...
-
For all that I know the Universe is under no obligation to make intuitive sense, I still don't like quantum mechanics. Just because some...
-
Hmmm. [The comments below include a prime example of someone claiming they're interested in truth but just want higher standard, where...
No comments:
Post a Comment
Due to a small but consistent influx of spam, comments will now be checked before publishing. Only egregious spam/illegal/racist crap will be disapproved, everything else will be published.