A critical piece of any voice interface is its questions. They’re essential to moving the conversation forward in a relevant, engaging way. I know we’ve all witnessed the bad ones: you have no idea which option to choose, you have no idea when to speak, and it overall feels like you’re trapped in a never-ending labyrinth.
Interfaces with well-constructed questions, on the other hand, flow by so seamlessly you almost don’t notice what’s happening — the interaction just feels smooth and efficient, leaving you with a good feeling even if you aren’t able to articulate why.
Although the art of question construction can take time and finesse to perfect, you can get far by following just a few simple ground rules. I’ll go through some of the most common mistakes I see, and how to avoid them.
This one might seem obvious, but I’ve seen it pop up over and over again. People will construct a question like this:
Ready to get started? If so, say “start.”
The “ready to get started” part is great — a classic, clear, yes/no question. I know exactly what to do: if I’m ready, I say yes, if I’m not, I say no. But then comes the tag-along “If so, say start,” shifting the inferred response paradigm. Now how can I respond? Start and no? Start and not start? There’s a little hiccup in my brain as I adjust.
Most people would probably follow along in this simple example, but when questions get more complicated, it’s not a guarantee. Furthermore, the second sentence interrupts the conversational tone you’ve set up in the first sentence.
People can intuitively reply naturally to the first sentence alone, but with the second sentence, you force them to use a keyword they wouldn’t ordinarily use to answer that question, pulling them out of the organic conversational vibe.
Whenever possible, let people say what they would naturally say. Last but not least, the second sentence makes the question longer with no added benefit, which nobody needs in their VUIs.
This is the slightly-less-frustrating cousin of the mistake above. I often see questions like:
This isn’t terrible, as the responses at least match the question in this case, but again, the first sentence is perfectly understandable on its own. When you tack on hints for obvious questions, you sound more robotic, add unnecessary length to your dialogue, and also risk the system interrupting your users who may already be responding after they hear the question.
A lot of new designers feel the need to add hints all over the place, fearing their users won’t know how to navigate. While I respect the impulse to be helpful, in 2019, your users are probably better at navigating a voice system than you think. Certainly, for clear yes/no questions, a hint is almost never needed.
Putting the how before the what
Let me show you the type of question I’m talking about:
Say “summary” to hear how the balance of all your accounts. Say “rewards” if you want to hear the rewards you’ve earned so far. Say “representative” if you need to talk to an agent.
In a chatbot, this is more or less fine. A user can see what their options are and how to get to them. But in a voice interface, this structure is problematic. When you put the ‘how’ (say “summary”) before the ‘what’ (to hear the balance of all your accounts), this is cognitively taxing. A user has to hold each ‘how’ in mind while listening to the ‘what’ to see if it pertains to them. You can offer your users a much lower cognitive load by simply inverting, like so:
If you’d like to hear the balance of all your accounts, say “summary”…
A user has time to recognize that what they’re hearing is something they want (“Oh, my account balances! That’s me!”), and then they hear what to do about it (“Say ‘summary’”). They hear the keyword they need exactly when they need it, with no need to store anything in their brain.
Information after the question
This sounds something like:
What you really want is for the user to start speaking after they understand the format. But since the first sentence is a question (and an easy question at that), your user may start speaking immediately. If your system allows barge-in, you may be inadvertently allowing them to answer in the wrong format. If your system doesn’t allow barge-in, your user will not be heard. In both cases, the user and the system run the risk of awkwardly interrupting one another. These kind of poor turn-taking cues (meaning, clarity over whose turn it is to speak) aren’t just confusing, they can leave people with unpleasant feelings about your interaction, just as if they were having a bad conversation with another human. You can try this out on Botsociety. Instead, try this:
I’d like to ask for your birthdate using two digits for the month, two digits for the day, and two digits for the year, like “zero six, twenty four, seventy nine.” What is your birth date?
Scooting the actual question to the end helps users know exactly when to speak. Plus, you can ensure they’ll have heard the format instructions.
Another thing I see is people wanting to collect information, but asking for it in a way that’s ambiguous. To use the prior example, that would look like this:
I’d like to ask for your birthdate using two digits for the month, two digits for the day, and two digits for the year, like “zero six, twenty four, seventy nine.”
Okay great — I know what information the system wants…but do I start speaking now? A user may stay silent, waiting to confirm it’s their turn to speak, and inadvertently trigger an error message. For this reason, whenever possible, I like to make sure my questions are actually structured in an interrogative sentence, leaving no doubt as to whose turn it is to speak.
What other common question design errors have you seen in the wild? Any tricky questions you’re wondering how to restructure? Let me know in the comments, and stay tuned for part two! Check out Botsociety to get started designing questions of your own today!
Also published on Medium.