Google To Train Its AI To Recognize 9 Indic Languages
Aadhya Khatri - Oct 01, 2019
The system of Google could recognize nine Indic languages, including Hindi, Bengali, Malayalam, Telugu, Gujarat, Marathi, Kannada, Tamil, and Urdu
- Celebrity Voices Update For Google Assistant : How To Get It
- Soon You Can Tell Google Assistant To Clean Your Search History
- Google Assistant Now Transfers Your Music Between Smart Speakers
There are thousands of languages spoken all over the world. So far, we have had records of around 6,500, and systems from tech giants like Amazon, Apple, Facebook, and Google are getting better at recognizing them. The problem is, training these AIs requires large corpora, and we do not exactly have that for every language.
To solve this problem, Google is applying knowledge it learns from languages with lots of data to the data-scarce ones. This attempt has proven to be fruitful as the company has developed a multilingual speech parser that can transcribe several tongues. The invention was introduced at the Interspeech 2019 conference taking place in Graz, Austria.
The authors said that their system could recognize nine Indic languages, including Hindi, Bengali, Malayalam, Telugu, Gujarat, Marathi, Kannada, Tamil, and Urdu, with a high level of accuracy, all while improving dramatically the quality of ASR (short for automatic speech recognition).
“For this study, we focused on India, an inherently multilingual society where there are more than thirty languages with at least a million native speakers. Many of these languages overlap in acoustic and lexical content due to the geographic proximity of the native speakers and shared cultural history. Additionally, many Indians are bilingual or trilingual, making the use of multiple languages within a conversation a common phenomenon, and a natural case for training a single multilingual model,” said Arindrima Datta and Anjuli Kannan, Google Research software engineers and lead coauthors in a blog post.
The architecture of this system combines pronunciation, language components, and acoustic into one. Other ASRs before it can only do this without real-time speech recognition. On the other hand, the AI of Kannan, Dattan, and their colleagues makes use of a recurrent neural network transducer designed to output words one character at a time, for several languages.
To avoid bias arises from a small amount of input data set, the experts of this project tweaked the system architecture a little bit to add in language identifier input. For example, they will take the preferred language on a smartphone into consideration. This, coupled with the audio input, enable the system to learn different features of separate languages as well as to disambiguate a given one.
The model was then further augmented as the team allocated extra parameters for each language, which comes in the form of residual adapter modules. This will help to enhance the overall performance and fine-tune the global per-language system.
What they achieved is a system that can deal with several languages with a performance surpassing all other recognizers that work on one single language only. All of that comes with simplified serving and training all while fulfilling the requirement latency needed for tools like Google Assistant.
“Building on this result, we hope to continue our research on multilingual ASRs for other language groups, to better assist our growing body of diverse users. Google’s mission is not just to organize the world’s information but to make it universally accessible, which means ensuring that our products work in as many of the world’s languages as possible,” the coauthors wrote.
This new system and the like will highly likely come to Google Assistant, which now has support for multiple tongues for multiturn conversations in Hindi, Korean, Norwegian, Swedish, Dutch, and Danish.
This study puts a focus on India, a nation speaking multiple languages. And it is a common phenomenon in the country for people to use a few tongues in one conversation, making it a natural case for the company to train its one single multilingual system.