Analyzing sound waves with Artificial Neural Networks
Sound surrounds us, everywhere in our environment, piggybacking on molecules of air and traveling in the form of a wave. All living beings use those sound waves, not only to communicate, but also to orient themselves, as well as perform various tasks. Sound can tell you a lot about a person, a place, or even a specific time. There is valuable information in every bit of sound, waiting for someone to decipher its nuanced meaning. Unfortunately, we humans often fall short of that task. But maybe Artificial Intelligence can succeed where we can’t?
Just recently, a hospital in Israel started a clinical trial with an app that analyzes speech to diagnose and monitor COVID-19 patients, potentially detecting changes in lung fluids even before patients do. If that’s possible, then where exactly is the limit of sound recognition and can A.I. take us there?
Let’s first take a look at all the things humans can find out just from listening to a voice. Think about all the moments you got a feeling about someone just from the way they talked. How often did you hear someone and immediately knew they were stressed? Or sick? Maybe you thought they were pretending to be sick by changing their voice. And even more subtly, that they were lying to you.
These are simple interpersonal examples, but recognizing voice patterns is certainly something that is useful in a work environment too. Consider a nurse that has to diagnose a patient, a salesman trying to figure out the mood of his client, or a police officer during an interrogation. Their listening skills will definitely influence their job success.
You (and them) might not be able to explain exactly what change you heard in the voice and might refer to this knowledge as intuition or gut feeling. Obviously, those things are easier to hear with someone you know very well. This means you have an “internal model” of their normal voice and something in their voice “…just sounded different this time”. But how did you learn to discern between the change in voice that is caused by someone feeling sick, or being drunk, or lying? You must have heard examples in the past where you knew for a fact what happened to them. This means the neural network in your head has been trained on examples to recognize the different types of changes. And over time, you will update your knowledge as you encounter more examples and learn from your mistakes.
This exact process of human learning can be reproduced by artificial neural networks. As long as something is reflected in a change of the voice, a computer can learn to discern the pattern. A computer learns an internal model of a normal voice (just like humans do) and learns what a deviation from normal means by being trained on examples (again, just like humans do). The advantage of a computer is that this can be applied on a much wider scale, and the acquired skill can be transferred easily between computers. This transfer of knowledge is something humans can’t do as effortless as a machine does. In addition to that, even very subtle changes in the voice can be picked up by a computer, subtleties a human could easily miss.
When these principles are implemented, it turns out that there are many possibilities. Algorithms can already recognize things such as stress , Alzheimer, Parkinson, deception, and even the specific sounds of street violence. Of course, this isn’t an easy task by far. But the first step in applying a solution is to recognize that there could be a solution out there.
Currently, there is a rapid growth in the ways these techniques are applied. If this is done with an ethical perspective, it can be used to do a lot of good in the public sector. While the COVID-19 pandemic has a negative impact on many areas of our society, we can also take this as a challenge and use our creativity to adapt to the situation. With the right technology we can solve real life problems, just like the hospital mentioned at the beginning of the article is doing!