Chochmolog: SXSW Day 3 session 4: The future of conversational UI

Hector Ouilhet, lead designer for Google Search and Assist products

Coming on the heels of the previous session, which talked about the change from search to assist and the potential impact it may have on companies that provides services, was this lecture on the benefits of the personal assistant, by someone at Google working on it.

He started off discussing the work he did planning his session in SXSW, including finding travel, hotel, etc. He said he spent about 10 hours planning it out. Then he played a brief voice interaction of how it would work with an assistant, which lasted about a minute. He emphesized the benefit of the assistant as the reduction of time between identifying what I want and getting it from technology (which he called "friction").
He reviewed the history of getting things done via the internet. First, there were portals which indexed content into categories, like yellow pages (the model that was familiar then). Then search arrived and shifted the paradigm, removing the need to manually categorize the internet; you stated what you wanted and were given a number of best matches to choose from. Then came the feed, which identifies relevant information and pushes it too you based on identifications or subscriptions you make (e.g. Facebook, twitter). This was another paradigm shift because it tried to anticipate what you want and deliver it to you up front. Chat apps are the next evolutionary step, where you have a conversational like interface to finding information. Personal assistants are a format of that using regular voice speech.
Where are we heading?
1. Smart everything - every physical object will be, in some way, smart.
2. Multi-user devices - the objects will change from being personal objects (like phones) to shared objects (like smart appliances). Interaction will be with any user of the device, not just it's owner.
The simplest way to communicate with all of these devices would be to use voice communication; and you want a single interface as the gateway to the devices, so you don't have to build the communication intelligence into each one separately, or talk to each one with its own protocol separately.

Moving beyond the actual message spoken - additional development will include the voice cadence, tone and expression to imbue even more understanding of the intent of the speaker (just like humans do).

What are some of the challenges we will see with this interface?

Intuitiveness of interface - smart devices add layers of capabilities to devices that used to have a very clear purpose. This can cause cognitive dissonance, and difficulty to understand the device.
Conversational interfaces lack the visual immediate feedback that regular devices provide, which helps understand where the problem is. For example, when you turn a light switch on and the light doesn't turn on, you know the problem is with the light bulb or there's a power outage. However if you say "turn on" to the light and it doesn't turn on, is the problem with the bulb, or the interface, or with understanding the command, or hearing it, or any other type of software issue? Hard to track the problem down.
Technical problems of using voice - learning accents, cultural language differences, speech impediments, etc.
Discoverability - how do you know what the device can do? When you have physical switches, you can see what can be done. With a voice interface there's no menu or visible cue to tell you what the device can do.
Human speech is frequently assuming the listener understands context or visual cues which the device might be missing. For example, "turn on that light" - which light? The assistant can't see what you're pointing at.
Audio is linear and non-persistent, as compared to visual interfaces, which can be non-linear and persistent. For example, if there is a list of options, you have to wait to hear them all to be able to know which one to use; in a menu you get them all at once and can skip the first three to get to the fourth.

What are some of the opportunities with voice interfaces?

Accessibility to all - no need to be tech savvy to use technology, everyone can do it regardless of education (although I would say that even though this is true for voice, it needs to be and can be true for any interface).
Device ubiquity - no need to carry a device with you at all times to interface with the world; all smart devices can be a portal for interfacing.

What needs to be done to get to this world:

Technology needs to adapt to us, not the other way around
Need to move beyond simple input/output interfaces
Need to design interfaces for speech
Need to move away from evolving products by adding features to evolving products by creating stories of how they are used (again, needs to happen regardless of voice interface)
Need to create a persona for gluing together the different interfaces into one coherent interaction point and giving an experience across multiple devices
Teach the technology to understand the context of our speech - we understand what it is, technology needs to as well. The tools we need for this are only just now being developed.
Need to understand that localization is not just language, it's the whole cultural frame of reference.
Need to strive towards conversations that are multi-modal, not just audio.

Chochmolog

Thursday, March 16, 2017

SXSW Day 3 session 4: The future of conversational UI

No comments:

Post a Comment