Grandma in Alexa

(To Marco Rottigni)
07/07/22

Personally I'm fascinated by artificial intelligence, technology and how it gradually but inexorably creeps into our lives in a pervasive way.

By improving them, of course, sometimes in a very disruptive way like ... reviving emotions linked to people who are no longer there.

In these cases, critical thinking is important: that very thought leads us to evaluate these impacts, decide if we want to be part of them, also weighing the potential negative aspects. This explains the fascination that technology has on me, as a reason for frequent epiphanies and cathartic moments.

One of these moments happened to me a few days ago, following the re event: MARS 2022. A kermesse on artificial intelligence in which the giant Amazon illustrates to the world its studies, experiments and innovations in terms of machine learning, automation, robotics and space… applied to present and future business.

The event itself is fascinating, full of ideas and provocations, with many illustrious guests, including external ones. The speeches of these guests, known in jargon as keynotes, are also available online on YouTube for repeated consumption by the public.

In the second day keynote available at https://www.youtube.com/watch?v=22cb24-sGhg, precisely after an hour and two minutes from the start, I was struck by a passage from Rohit Prasad - head scientist by Alexa AI.

I was following his contextualization, really well prepared: speaking of the empathy between man and machine as a feeling at the basis of building a relationship of trust, he focused on the fact that for many of us the recent pandemic emergency has meant the loss of a loved one.

Alexa - often a symbol of this technological presence even if for simple conversations - has developed skills over time for which I was literally thunderstruck: obviously not such as to eliminate the pain of these losses, but sufficient to provide a further way to make people's memory more persistent care.
Within seconds, the video changes to show a child asking Alexa to have his deceased grandmother read him a passage from the Wizard of Oz, just as he did while he was alive.

Alexa replies with an "Okay" to the request, to immediately move on to a perfect simulation of the voice of the dear grandmother, giving a visibly tangible emotion to the technological grandson.

The video then returns to Rohit, who immediately explains two things that struck me very much for his innovative power: the first, that the possibility just seen derives from a change of perspective in the way in which the voice is analyzed; more specifically, moving the analysis from a problem of speech generation, that is the production of an audio phrase, to a question of voice conversion.

The second thing, closely related to the first, concerns how this change of perspective allowed reproduction with only one minute of existing voice recording, compared to hours of studio recording that the previous approach would have required !!!

(Rohit Prasad on the stage of re: MARS 2022)

But I was talking first about the need for critical thinking: past the wow momentin fact, I began to reflect on some aspects.

There are several online services, in which one of the authentication methods consists precisely in pronouncing a sentence to prove that you are the person you claim to be: it is clear that with such technology, which is also available not only to Amazon, it becomes important to entrust strength of the authentication process to other mechanisms - certainly less simulable.

Voice biometric authentication is certainly a young technology, which however has experienced quite immediate popularity due to its ease of use and the absence of accessory tools to have. Much of this popularity has seen banking and insurance services embrace it in a major way, albeit lately combined with other mechanisms to increase the overall strength of the process. In a 2020 article, the combination of multi-frequency tones of a telephone keypad was hypothesized as a strong authentication process, combined with a phrase recorded with the customer's voice that had to be reproduced: it goes without saying that the innovation presented by Amazon it would nullify that force, putting the authentication process at risk of impersonation attacks. Attacks that is in which the attacker has all the necessary tools to act as if he were the final victim, deceiving the technological entity that should certify the identity.

Another risky process is that which involves controlling access, physical or virtual, to work for pricing reasons or checking working hours. Another area in which attackers would be greatly facilitated by these speech synthesis technologies.

The second reflection is a little more emotional: as I said at the beginning, I find this possibility absolutely fascinating and innovative; I realize, however, that for different people this may not already be a source of positive emotions, but prolong the pain of losing a loved one in an important and hardly bearable way.

Well, if there is one thing that distinguishes the human being from technology it is consciousness, from which free will originates. History teaches us the futility of denying progress, science and technology in favor of personal feelings and beliefs - right or wrong. That's why I simply recommend that these people choose. Choosing to ignore this possibility offered by technology, while leaving the same right to the rest of human beings.