google audio indexing

Update 8-29-2013 – Google GAudi appears to have been retired and the google group along with it has been expunged!

Google is using speech recognition to index political videos on youtube. With a search, one can easily find what a politician has to say about a particular subject. http://labs.google.com/gaudi/

Google wants to unleash this powerful tool to index all of youtube’s shenanigans, after they’ve tested it on politics. I think it will make a major impact on how we access information.

What google says about the technology:

Google Audio Indexing uses speech technology to transform spoken words into text and leverages the Google indexing technology to return the best results to the user.

The returned videos are ranked based -- among other things -- on the spoken content, the metadata, the freshness.

We periodically crawl the YouTube political channels for new content. As soon as a new video is uploaded to YouTube, it is processed by our system and made available in our index for people to search.

I did some research into what they are actually using to detect the speech patterns. It is called Hidden Markov Modelling. Here is a pdf explaining some of the mathematics behind it.

I would like to share some other documents I came across that I thought were interesting:

This is a 7-part report on how to get Saudi Speech Recognition to differentiate and understand the different accents of their language.

How To: Use HTK Hidden Markov modelling toolkit with SFS

The Google Group for Google Audio Indexing