Voice UI: Comprehensive Tech Review

A holistic review of Siri and other Voice UIs covering heuristics, accessibility, and inclusivity.

Mahnoor Afteb
11 min readFeb 26, 2021
Source: dribble.com

Voice UI (VUI) has impacted how users communicate with AI through basic user interaction that mimics human communication. Big tech companies have taken advantage of this emerging technology and have helped popularize it to the consumer society, whether it’s Apple’s Siri, Amazon’s Alexa, Microsoft’s Cortana, Samsung’s Bixby, Google Home/Assistant and the list goes on. The increased availability of VUIs especially on phones, laptops, TVs, cars etc. has allowed it become a convenient option for everyday tasks and assistance.

VUIs like Siri or Alexa have also become a hot topic in popular culture through user interactions that are often described to be quite amusing. From witty responses to unforgettable misunderstandings, VUIs certainly have their pros and cons just like any other technology or human-computer interaction in general. As one who has been using Siri since its launch in 2011, I’ve had my fair share of funny and/or embarrassing moments. I also have been able to maximize its efficiency for daily tasks and questions I may have.

It is important to analyze Siri along with other Voice UIs through a holistic perspective with a focus not only in respect to usability heuristics, but also in regard to accessibility and inclusivity.

Personal Engagement:

Siri is the only VUI I have used till date. I guess I’ve become somewhat comfortable interacting with the interface throughout its multiple updates and versions.

I will first document my personal engagement (on iPhone) with the latest (iOS 14) version of Siri and in the next section I’ll cover heuristics pertaining to VUIs in general.

Invoke Siri:

As of now, there are 2 ways to start using Siri:

  1. Pressing the side button (iPhone X or later) or home button on other iPhones.
  2. The other way is to simply say “Hey Siri” which can be enabled/disabled in the settings.

After using both of these methods several times, I found a few usability issues. Pressing the side button (a.k.a. power button) could lead to accidentally turning the phone off which could get really frustrating. I actually miss using Siri from pressing the home button since that seems more of a logical and error-free method and It took me some time to adjust to this. Nevertheless, holding the power button for a few seconds longer is still a functional method, even if it may take new users a little time to get used to.

I also have a love-hate relationship with the “Hey Siri” feature since there are often times where I’m not heard at all even though I’m right next to my phone. Another reoccurring issue I find is how this feature could be mistaken on another iPhone in the vicinity. For example, I have found that if my sister’s phone is nearby and I say, “Hey Siri” then her phone turn on instead of mine and vice versa. The Apple support website has provided solutions to this issue. When setting up this feature, the AI should recognize your voice whenever you call.. well most of the time. Either way, both of these methods do successfully call upon Siri.

Initial Interaction:

So right after I “call” Siri, a small circle animation shows up and if I simply initialize the conversation by saying, “Hi” then Siri responds with something like: “Hello. How can I help?”. A micro interaction occurs whenever Siri responds in the form of a very subtle vibration near the circle which helps indicate that Siri responded.

image screenshot of siri saying “Hello, Mahnoor. How can I help”
Siri at your service.

Feedback/Response:

After asking simple questions regarding the weather or setting an alarm, the interface displays the desired action, an option to undo, and the associated feedback.

Siri vanishing after a few seconds.

If I don’t say anything afterwards or in the first place, then it turns off by itself and it’s kind of like Siri hung up on me which isn’t always an efficient manner of communication. I think a more user-friendly approach that could be taken is that Siri welcomes the user first and if they don’t respond, it says something like, “Are you still there?” and then turns off if there is no response.

As Siri turns off after a few seconds of inactivity, there could also perhaps be some sort of indication like a simple animation/movement in the circle that could alert the user instead of the circle instantly disappearing. Better feedback can especially keep the user more engaged and aware of the system status.

VUI Heuristics

Here’s a general usability review of VUIs using most of Nielsen’s Usability Heuristics for User Interface Design.

Visibility of system status:

Voice UIs offer various ways of displaying feedback. Siri, Cortana, Bixby mainly use the digital interface of a phone or other GUIs to communicate while Google Home and Alexa/Echo are available on personal devices along with smart speakers, smart home appliances, and wearables. Needless to say, there are countless ways for these VUIs to communicate with users.

For VUIs that take the form of a smart speaker like Apple’s HomePod, Alexa or Google Home, it is important for users to clearly hear responses quickly and efficiently. Since the user is not able to visually see any form of written communication, it is important for subtle visual cues in the form of a light flicker or color change to help indicate whenever the system is on or processing information.

Alexa’s spinning blue ring light indicates the device is starting up. Source

Match between system and the real word

Just like a regular human conversation, there should be proper feedback given at reasonable times and with minimal to no interruptions. Both verbal and written communication should use plain language to help prevent any ambiguity or miscommunication. There should be no use of technical jargon or acronyms that can especially throw the user off.

User control and freedom

When asking a VUI to perform a task or listen to a command, it is important that there is consistent feedback and the ability to undo/redo or cancel an action immediately. Verbal commands like saying, “Siri/Alexa, stop” should automatically given priority over the current action taking place. For digital interfaces, there should be as Nielsen says a clearly marked emergency exit. Siri does a great job allowing the user to make last second changes to a command. Also, the interface does not take up the entire screen like it did in the past but there still isn’t a way to be able to multitask while using Siri.

Siri allowing option to edit/change reminder info.

Consistency and standards

All VUIs should all roughly follow . For example, it should be relatively easy for a Google Home user to switch to Alexa to have no trouble using basic features like turning the speaker on/off or communicating with the voice assistant. Although every product is unique in their own way, the basic functionality should be similar in order to avoid increasing the users’ cognitive load. It shouldn’t be hard to learn to interact with a new VUI.

Error prevention

When it comes to errors, they are bound to happen especially when the VUI does not clearly communicate or indicate options to undo or change an action. It is important for there to be proper warning messages that not only ask to confirm a task, but also provide a clearly marked exit or alternate action. Simply saying, “Are you sure you want to do this” or “Do you want me to repeat” helps provide proper feedback and confirmation.

Confirmation to send text message on Google Assistant.

Recognition rather than recall

When interacting via verbal or written communication, it is crucial for the user to be engaged with the VUI and there should be no distractions in the form of confusion that cause a distribution. The user’s memory load and attention span are two very important considerations in UX. Voice interfaces should be able to provide easy ways to implement instant commands/communication without leaving the user perplexed.

Flexibility and efficiency of use

Shortcuts for voice interfaces can especially help speed up interactions but should be designed to cater novice and experienced users. Some examples could be possible options for the user to set shortcuts/commands like “Brief mode” on Alexa. This mode basically allows users to configure their Echo devices to utilize sounds/chimes instead of a verbal response. This is helpful for those who want a simple task done like turning the lights off and instead of Alexa speaking, a chime indicates the action is done.

Configuring “Brief Mode” in the Alexa app.

Help and documentation

Systems like voice interfaces usually don’t need an entire walkthrough as they should guide the user along by telling them what the voice assistant can do for them. For example, after enabling Siri on my iPhone and saying “Hello”, Siri prompts me to ask about her features and even a website link explaining

Siri helps guide the user to discover features.

Accessibility Review

Whether it’s setting up a quick reminder on your phone or playing music in your car, the functionality of VUIs should be designed with a wide range of users in mind. Accessibility is crucial to everything in general and as modern society is advancing, it should be top priority for the web and technology. Individuals with impairments in hearing, vision, speech, motor and cognition should always be accommodated.

VUIs should be compatible with assistive technologies like hearing aids for example (wireless capability is preferred). To prevent errors, the voice assistant should clearly ask before confirming anything with the user and there should be a clear way to undo or edit a task/list. The voice UI should not interrupt or talk over the user as that can perplex them more and should always allow appropriate pauses for the user to respond. The voice assistant’s speech rate should be adjusted to how fast or slow the user wants as this can help with a range of disabilities. Those with speech impairments can have difficulty interacting with a VUI since it may not understand every word that they are saying.

Google launched Project Euphonia in 2019 to help make Voice UI more accessible to people with speech impairments:

“Project Euphonia is a Google AI research effort to help speech-impaired users communicate faster and gain independence.”

VUI Accessibility Cheatsheet:

I designed this cheatsheet outlining important accessibility guidelines to consider when designing for VUI.

Inclusivity Review

Inclusive design acknowledges human diversity and the various ranges of perspectives and backgrounds. All users should have an equally user-friendly experience with VUIs that they can identify and connect with. Keeping in mind that there are countless aspects of identity that is unique each individual, there a few relating to VUIs that I believe should be improved:

Gender diversity:

There is a lack of diversity in popular voice UIs. There is a strict gender binary option to choose between male or female voices in Siri, Alexa, Google Assistant, Cortana, and Bixby. Most VUIs use use the female voices as their default option which brings attention to issues of gender bias and stereotypes. There should be more options for gender neutral voices as opposed to the usual male and/or female voice assistants. In fact, there is a genderless voice assistant named, “Q” . All VUIs should adopt this additional option to help promote a safe and inclusive environment along with providing representation for all including individuals who identify as non-binary along with the LGBTQ+ community.

Q Voice UI (Left). Default options on Siri (Right)

Language/accent diversity:

The world consists of so many different races, cultures and nationalities and this should be considered in the forefront when designing VUIs. Alexa is equipped with the fewest options with around 8 languages. Siri caters to around 20 languages while Google Assistant has the most at 44 languages. In actuality, there are around 6,500 languages in the world. It is understandable that there are development and voice synthesis issues that prevent catering to every language in the world, but should be more variety that can cater to minority languages.

Often people who don’t speak English as their first language may be misunderstood by voice assistants. Also, people that speak a different dialect of single language should be able to both understand the VUI and be understood themselves in their respective dialect

Google Assistant caters to the most languages. Source

Conclusion

Voice UIs have overall revolutionized the way users use technology and AI to help make everyday tasks easier. Society will continue advancing towards different ways to use these virtual assistants to everyone’s advantage. No matter how far or what direction this emerging technology may go, designers and developers should always keep accessibility and inclusivity in mind.These two essential facets help provide a user experience that is designed to cater to all identities and abilities combined. As stressed prior, VUIs should focus on extensively broadening their design choices towards inclusivity with more languages, accents and gender variation options that can help resonate with a multitude of identities.

I believe an important takeaway from this article is that designers, developers, and essentially innovators may have some idea of who their target audience is, but there truly is no way to know every single one of them. With that in mind, it is essential that we don’t ever design solely with one group of people or perspective in mind. Open-minded and intersectional thinking is the key to a future where accessibility and inclusivity is the forefront to revolutionizing design!

--

--