Voice assistant devices have become an integral part of our current technological environment: smartwatches, smartphones, smart-earphones, smart-homes and so much more lays ahead in the future. These devices allow users to make phone calls and send texts, do a quick web search, look up the weather, even control other smart devices (the coolest control I encountered was dim the lights) with simple voice commands and no physical interaction with objects around (One can even ask Siri to sing a song or tell a joke). There is freedom from wires, buttons and keyboards!
I recently learnt about 70 million people worldwide suffer from stuttering, cause could be physiological, neurological or trauma based. That’s roughly 1% of the global population. And this is one just speech disorders. There are many! The functionality (or the limitation?) of the voice controlled smart devices relies on the clarity of the command. See where I am going with this?
While voice control of technology is becoming more accurate (with extensive machine learning to even recognize accents!), accessing such technology for stutterers still remains a huge hurdle. Current voice assisted systems fail to identify and intelligibly understand disjoint/broken speech. This limitation is also faced by individuals with other speech disorders. According to research by Frank Rudzicz (Assistant Professor, University of Toronto), for individuals with dysarthria, the word-recognition rates for such technology can be between 26.2% and 81.8% lower than for the general population. Can this be improved upon?
While manufacturers of these devices boast of reduced world recognition error when commanded upon in regular speech with extensive training of devices on more data from different speakers, what makes accommodating speech disorders is the randomness of when speech gets affected in speech disorders, and the unpredictability of which part of speech gets affected. Stuttering can occur anytime. Any word can “trip” the speaker to stutter. While some stuttering patterns can be identified (if there are specific sounds that the stutterer finds difficult, is the stuttering at the beginning of the word vs. the middle of it etc.), these patterns are almost unique for each stutterers (However, don’t be discouraged, stutterers can be grouped together in how they stutter). Additionally, there is a wide range in the severity of these disorders, thus, creating one computational model that fits all is not possible.
With voice-enabled technology getting more embedded in our lives (I just saw smart, voice-controlled coffee makers) there is a need for more inclusivity in technology. If they are looking for more data to train their AIs, I am available!
