Demystifying speech recognition with Project DeepSpeech

Demystifying speech recognition with Project DeepSpeech

Our voices are no longer a mystery to speech recognition (SR) software, the technology powering these services has amazed the humanity with its ability to understand us. This talk aims to cover the intrinsic details of advanced state of art SR algorithms with live demos of Project DeepSpeech.

A research says that “50% of all searches will be voice searches by 2020”. World’s technology giants have placed big bets with their investments in services providing voice search, personal digital assistant, IoT devices etc. Solving the problem of speech recognition is a herculean task, given the complexity involved with data like human voice.

The talk will cover a brief history of speech recognition algorithms, the challenges associated with building these systems and then explain how one can build an advance speech recognition system using the power of deep learning and for illustration, we will deep dive into Project DeepSpeech. Project DeepSpeech is an open source Speech-To-Text engine developed by Mozilla Research based on Baidu’s Deep Speech research paper and implemented using Google’s TensorFlow library.

Speech recognition is not all about the technology, there are a lot more concerns, challenges around how these AI models are being part of our day to day life, its biases etc. The bigger question revolves around centralization of these AI services, projects like Common Voice addresses these problems by enabling all to be part of this revolution, a part of the talk will focus on how people need to approach these type of research keeping in mind the community and humanitarian benefits as first priority.

Buzzwords: AI, speech recognition, speech to text, machine learning, Python, tensorflow, deep learning, Voice search
Level: Beginner: Target audiences with basic experience of python programming
Requirements to Audiences: NIL
Language: English

Speaker: Vigneshwer Dhinakaran (India)

Speaker Bio: Vigneshwer is a data scientist at Epsilon, where he crunches real-time data and builds state-of-the-art AI algorithms for complex business problems. He believes that technology needs to have a human-centric design to cater solutions to a diverse audience. He’s an official Mozilla TechSpeaker and is also the author of Rust Cookbook.