Free Shipping on orders over US$39.99

Amazon Kickstarts Natural Language Understanding By Open-Sourcing 'MASSIVE' Speech Dataset – MarkTechPost

To scale natural language understanding to every spoken language on Earth, Amazon.Inc has announced the release of its open-source ‘MASSIVE’ speech dataset. The main goal of curating such a dataset was to assist researchers in developing virtual assistants that could easily be generalized to some of the world’s most hidden languages. In addition to the database, Amazon has also published open-source modeling code to help developers create more capable virtual assistants.
Several new technological breakthroughs in speech recognition and natural language understanding (NLU) have opened the way for voice-activated digital assistants such as Siri, Bixby, and Google Assistant. The primary shortcoming of these voice-controlled personal assistants is that they are only available in a few familiar languages. The MASSIVE dataset is one step forward in the creation of a dataset that spans several obscure languages to build multilingual natural-language-understanding models that can smoothly adapt to those languages whose training data is scarce, intending to allow people all over the world to enjoy the availability of conversational AI systems like Alexa in their native languages.
The Multilingual Amazon SLURP for Slot Filling, Intent Classification, and Virtual-assistant Evaluation, or MASSIVE for short, is a ‘parallel dataset’ that includes one million labeled utterances in 51 languages, including those that lack properly labeled data, as well as open-source code that demonstrates how to execute massively multilingual NLU modeling. With Alexa currently being available in 7 languages, the company aims to expand it to over 7000 languages spoken in the masked corners of the world.
Professional translators meticulously curated the dataset by translating the available English-only SLURP dataset into 50 varied languages that lacked labeled data. The MASSIVE database, according to Amazon, will be especially effective in improving spoken-language understanding, in which audio is transformed into text before NLU is done. Natural language understanding (NLU) is a branch of natural language processing (NLP) that deals with converting human language into a machine-readable format. 
Amazon is also establishing a new competition called Massively Multilingual NLU 2022 (MMNLU-22) that will use the MASSIVE dataset to encourage academics to design models that can readily adapt to new languages and create more third-party apps for Alexa. The competition will be hosted on a platform called eval.ai and will include two tasks. During December, the competition’s outcomes will be presented at an EMNLP 2022 workshop in Abu Dhabi and an online session called Massively Multilingual NLU 2022. It will also feature presentations by guest speakers and oral and poster sessions with papers on multilingual natural-language processing that have been submitted.
Amazon has a vision for its products like Alexa and Echo to reach and be available to all customers and devices. With these three significant announcements, it aspires to become a key player in the global NLU translation system community.
Source: https://www.amazon.science/blog/amazon-releases-51-language-dataset-for-language-understanding
References:
Marktechpost is a California based AI News Platform providing easy-to-consume, byte size updates in machine learning, deep learning, and data science research
© 2021 Marktechpost LLC. All Rights Reserved. Made with ❤️ in California
Netspresso Model Search

source

rafi rajib
We will be happy to hear your thoughts

Leave a reply

DEAL SOURCING
Logo
Reset Password
Compare items
  • Total (0)
Compare
0
Shopping cart