At least one key feature of the HAL 9000.
In a blog post, the company reveals its new deep learning model works by using both the auditory and visual signals of an input video – in short, it lip reads. “The visual signal not only improves the speech separation quality significantly, in cases of mixed speech (compared to speech separation using audio alone, as we demonstrate in our paper)”, the post reads. “Importantly, it also associates the separated, clean speech tracks with the visible speakers in the video.”
Google demonstrates its new AI model using a series of videos including one of two stand-up comedians talking loudly at the same time (which you can watch below), and its effectiveness is startling. It can pick out either man’s voice without any problems, and the speech is so clear there’s no clue someone else was even speaking on the original recording.
http://www.alphr.com/google/1009054/google-ai-speech-separation
Sister blog of Physicists of the Caribbean in which I babble about non-astronomy stuff, because everyone needs a hobby
Subscribe to:
Post Comments (Atom)
Review : Norse Myths and Tales (II)
As per usual, a single-part post just isn't going to cut it. Having ranted at considerable length against the Norse sagas (of Flame Tree...
-
I've noticed that some people care deeply about the truth, but come up with batshit crazy statements. And I've caught myself rationa...
-
Hmmm. [The comments below include a prime example of someone claiming they're interested in truth but just want higher standard, where...
-
"The price quoted by Tesla does not include installation of the unit. To this needs to be added the cost of installing solar panels to ...
No comments:
Post a Comment
Due to a small but consistent influx of spam, comments will now be checked before publishing. Only egregious spam/illegal/racist crap will be disapproved, everything else will be published.