The increasing functionality of smart speakers also leads to a growing attack surface for hackers. In 2019 research by SRLabs unveiled two scenarios hackers might abuse both Alexa or Google Home to spy on users. The vulnerability allows third parties to create functions that use vishing (voice phishing) methods or eavesdrop on the user. The researchers demonstrated the hacks by creating voice applications for both device platforms, turning the smart assistants into ‘Smart Spies’.
Both Skills (Alexa) and Actions (Google Home) can be activated by calling out the invocation name chosen by the developer: “Alexa, open Netflix”. Users can then call functions (Intents) within the application by speaking specific phrases: “Play Star Trek: The Next Generation”. These phrases can also include variable arguments by users as slot values (variable user input that is forwarded to the application). These input slots are converted to text and sent to the applications backend, which are often operated outside the control of Amazon or Google.
Through the standard development interfaces, the SRLabs researchers were able to compromise the data privacy of users in two ways:
The ‘Smart Spies’ hacks combine three building blocks:
It is possible to ask for sensitive data such as a password from any voice app. To create a password phishing Skill/Action, a hacker would follow these steps:
Now anything the user says after “start” is send to the hacker’s backend. That’s because the intent, which acted like the fallback intent before, saves the user input after the "start" as a slot value.
In a second experiment, the researchers at SRLabs were also able to listen to conversations after users believed to have stopped our voice app. To accomplish this, the team had to adapt their strategy for each of the voice speaker platforms.
For Alexa devices, the voice recording is started by the user calling certain trigger words, that are defined by the developer of the skill. This can also include common words such as “I” or words indicating that some personal information will follow, like “email”, “password” or “address”.
To researches created an eavesdropping skill following these steps:
If the user tries to end the malicious skill, they hear a goodbye message, but the skill continues to run for few seconds. If the user starts a sentence beginning with the word selected in step 1 within this time, the intent will save the sentence as slot values and send them to the attacker.
For Google Home devices, the hack is more powerful, because there is no need to specify certain trigger words and the hacker can monitor the user’s conversations infinitely.
This is possible because Google allows putting the user in a loop where the device is constantly sending recognized speech to the hacker’s server while only outputting short silences in between.
To create such an eavesdropping Action, a hacker follows these steps:
After outputting the requested information and playing the Bye earcon, the Google Home device waits for approximately 9 seconds. If no speech input is detected, the device "plays" a short silence and waits for user input again. The Action actually stops if no speech input is detected within 3 iterations.
If speech input is detected, a second intent is started. This intent only consists of one silent output, again with multiple silent reprompt texts. Every time speech is detected, this Intent is called and the reprompt count is reset.
The hacker receives a full transcript of the user’s subsequent conversations, until there is at least a 30 second break with no speech input. (Hackers could extend this break by extending the silence duration, during which the eavesdropping is paused.)
In this state, the Google Home Device will also forward all commands prefixed by “OK Google” (except “stop”) to the hacker. Therefore, the hacker could also use this hack to imitate other applications, man-in-the-middle the user’s interaction with the spoofed Actions and start believable phishing attacks.
The researchers would also have been able to also request the corresponding email address and try to gain access to the user’s Amazon or Google account.
Alexa and Google Home are powerful, and the smart devices can be very useful, especially in private environments. However, their implications for privacy are reaching further than what many users might know. Users need to be aware of the possibilities for hackers to use malicious voice apps to abuse their smart speakers. Using a new voice app should be approached with a similar level of caution as installing a new app on your smartphone.
Amazon and Google need to implement better protection, starting with a more thorough review process of third-party applications made available in their voice app stores. The voice app review needs to check explicitly for copies of built-in intents. Unpronounceable characters like “�. “ and silent SSML messages should be removed to prevent arbitrary long pauses in the speakers’ output. Suspicious output texts including “password“ deserve particular attention or should be disallowed completely.
The original research was done by Fabian Bräunlein (@breakingsystems) & Luise Frerichs and published on SRLabs.com.
The researches shared their findings with Amazon and Google through their responsible disclosure process.