In conversation with Wally Brill: AI conversation design expert

Interview with Wally Brill

Transcript

Douglas Nicol:                  Hello, and welcome to Smart Dust, the podcast that likes to tune into the trending topics and news in the world of technology data and innovation. As always, I’m joined by my esteemed colleague, Mr. Nick Abraham’s.

Nick Abrahams:               Hello, Douglas. Hello everyone.

Douglas Nicol:                  In this episode, we’re talking about how the primary interface between humans and computers is changing. We’re moving from an era where the keyboard is the main interface with computing to an era where we will use our voice as a much more natural human interface. Now, fueling this is the fact that in the U.S today, there are over 120 million smart speakers in households, and over 19 million smartphone users are using a digital assistant like Google Assistant, Siri or Cortana. The Voice revolution has well and truly began.

Now. Today we’re at Google headquarters in Sydney to talk to Google’s expert on designing voice interactions, Mr. Wally Brill. Wally, you’re very welcome to Smart Dust.

Wally Brill:                         Thank you very much, Douglas.

Douglas Nicol:                  Now, we hear a lot about the concept of a voice first world. What exactly is a voice first world and how does it improve our day to day life?

Wally Brill:                         So, great question. There’s a friend of mine who works here in Australia who has this concept of the raw chicken moment. He has a slide showing a picture of somebody’s hands, and in those hands, there’s a pile of raw chicken. Basically, his point is, this is not a moment where you’re going to reach into your pocket and pull out your cell phone. You’re not going to use a touchscreen. You’re going to use your voice. It’s a voice moment.

When we talk about voice first, we talk about using voice as the primary interaction mode. It may be supported and given extra substance by things that may appear multimodaly on a screen, chips for instruction, information, whatever. But we’re primarily using our voice to get the information or request the information.

Douglas Nicol:                  Does that mean that we’ll be able to lose the keyboard from our lives in the future? Maybe in 10 years’ time?

Wally Brill:                         That’s a great question, again. I don’t know about lose, because there may still be environments and situations where you want to have a silent interaction. But certainly, it’s going to be played down a lot and it’s going to be used a lot less.

Douglas Nicol:                  It’s interesting because you come from a background of originally, the world of IVR and those kind of systems. Many people found voice interactions with maybe their bank with an IVR system, less than exciting and-

Wally Brill:                         Don’t blame me Douglas.

Nick Abrahams:               Wally, we understand-

Wally Brill:                         It wasn’t all my fault.

Douglas Nicol:                  But how is it going to be different this time? Because we didn’t particularly enjoy IVR, why is voice different this time?

Wally Brill:                         The technology has improved. Its gotten to a much, much, much better level because voice is in the cloud now, there are billions and billions and billions of utterances that are being analyzed and understood. So, natural language understanding is so much better than it was before. The technology works now.

Douglas Nicol:                  Wally, does that mean that particularly with natural language processing, and that being a bit of a game changer, does that mean we’ll say exponential growth in the competency of voice as we’ve seen the experience now with for example, Siri is significantly better than when Siri first started? We’re just going to see that continue to ramp up?

Wally Brill:                         I can only use the example of the Google Assistant, which is extraordinary. I believe that the understanding of this technology now is absolutely ready for prime time. It’s seamless. Yeah, it has improved radically, I would say in the last three years.

Nick Abrahams:               You raise an interesting question there with Google Assistant, we’ve obviously got a number of other voices. Samsung has said that Bixby will be in all of its appliances by 2020. Will it be one voice to rule them all, or will everyone have their own separate voices and that will be the identity of a brand?

Wally Brill:                         It’s, again, a very important question. There’s been a lot of discussion about what’s called the meta bots. This is a term that was coined by the guys at Opus Research in the States. By meta bots, they mean Google Assistant, Alexa Siri, Bixby, I guess to some degree. Where is the place for brand in relation to the meta bots, and where are the meta bots in relation to each other?

I’m not a great believer that we will only see one meta bot. I think people will be designing for multiple platforms, they do different things better than one another. They’re differentiated. I think for brands, brands have to be able to differentiate in their own ways, and that’s where personality and persona come into it. I suspect we’ll be talking more about that later.

But I think there will be … I don’t think I’ll necessarily have to talk to my toaster. I don’t think I want to make it darker, make it lighter. I may say to the Google Assistant, make me some dark roast. The Google Assistant will drive the toaster to do its bidding. But I don’t think we’re going to have a million interfaces in every device in everything.

There was a … Oh, God, I wish I could remember the name of the writer. There was a writer who writes hardboiled detective fiction about the future. It was a book called Clones. In this book, this detective is saved by his refrigerator, who likes him a good deal and throws himself in front of somebody who’s going to hurt him and crushes this bad guy. It’s wonderful.

Douglas Nicol:                  When we see, just to finish off, I guess on a part of that stream, which is Amazon are betting a lot on the idea of the appliance interface with Amazon Basics and their appliances and so forth and Alexa in the Amazon Basics microwave. They seem to be focused on that world where there will be a lot of appliances that will have their own personality. But I, like you, believe that it seems like at a curious world where everything talks to everything else.

Wally Brill:                         Yeah, there’s value. It’s not so much that I don’t think they’ll be AI in appliances. I do. But do I need to pivot to the left to talk to the refrigerator or the microwave rather than just talking to my Google Home hub that’s in the kitchen? I just think it’s about overkill. Does everything needs to be able to speak to me, or is there one centralized home device that I’ll be able to speak to?

Douglas Nicol:                  Here I am, I’m talking to AI, I’m talking to a computer. When I talk to a human, there are nuances of any given conversation. They intent to that conversation, the subtleties, the shortcuts of a conversation with human. How is that being replicated in Google Assistant and those nuances of conversations, what kind of things are happening?

Wally Brill:                         Well, it’s really about great design. All of this comes down to phenomenal interface design, and there are this technology and this discipline has matured over the years. It’s been around since, I guess, 1996 or so in its nascent form. I would be [inaudible 00:08:05] not saying that we obviously have the greatest designers in the world at Google.

We have a team that’s extraordinary. I’d like to say there’s 450 to 500 years of experience within our design team. That’s just me. But there’s a tremendous amount that goes into it. It’s a discipline, it’s learned, it’s studied, it’s understood. In terms of putting conversational dialogue within these interfaces, this is something that was originally pioneered by a man called James Giangola. Who wrote a book called Voice User Interface Design a long time ago. His concept was taken in part from research by Clifford Nass, who wrote Wired For Speech, which is basically the concept being that people communicate with robots in very much the same way they can communicate with other people, they don’t differentiate that much.

So, the natural conventions of conversation apply. The more natural the conversation is, the more successful the user is going to be in, in conversing with that robot. We do put in those nuances, we do make sure that the conversation flows naturally. We design from conversation. We create what we call sample dialogues. A sample dialogue is a play between a user and a system. We write that conversation out and then we read it and listen to it. We listen to it in audio, to determine whether it’s natural. If it is, we have something we can design from.

Douglas Nicol:                  One of those nuances is this concept of tapering and I’ve heard you talk on this topic, explain to us what tapering is and how it really helps with that conversation flow.

Wally Brill:                         Well, the first time I use an interface, I need to learn what it does, what it can do. It’s going to tell me in some way or other, this is the kind of thing I can do, and this is how I work. After I use it subsequent times, it doesn’t need to give me that information. If I say, order me five fish, and it’ll go, “Okay, you want to order five fish, I can do that. Sure. Boom.” The next time I say order me five fish, it’ll go, “Okay, done.” We have the context between us, and it knows that I understand that it can do this thing. So, it becomes much much, much more constrained as a conversation and much more tapered.

Douglas Nicol:                  Today, you’re head of conversation, design, advocacy, and education based at Mountain View. What does this job entail? You spend a lot of time traveling around the world, working with brands and organizations to actually get to understand best practice around particularly conversation flow and persona design. What does that entail?

Wally Brill:                         Air miles. Getting a lot of air miles. Two wonderful designers at Google, Peter Hodgson, and Mark [Polina 00:11:18] created a workshop format that’s based on Google Sprint methodology. It basically shows and teaches the methods of defining and designing rich personas that allow the robots that will live for lack of a better term, and to create that nuanced conversation that you refer to. That relies on those natural conventions of speech, to make the thing live.

Douglas Nicol:                  Why is that important? Why not just have a functional voice that gives you the answer or the information that you need? Why is it important to have a persona?

Wally Brill:                         Sit five writers down in a room and ask them to write language for an AI. Nobody knows how AI speaks. There’s just no such thing. There’s no reference for it. So, we have to have a reference. That reference is most easily leverage by defining something that’s like a human. We’re not trying to create a human, but we’re trying to use what we know about humans.

In the old days, when we made those terrible IVRs, we used to go into places and people would say, “Oh, you don’t need a persona. We don’t need a persona. We just want business neutral.” We’d ask them, “What does business neutral sound like?” Well, we want them to be cheerful. We want them to be confident. We want them to be … “What does that sound like?”

All of this generic language doesn’t really get us to what the thing says. We have to model on what we know about actual people. So, we create a concept of a being, and that being is like a person and it uses a shorthand of conventions of people. That’s basically how we do it. Because if we don’t do that, then the thing sounds schizophrenic and strange.

Then, if we’re using live voice actors to be the voice of the system, they need a suit of clothes to put on, they need a character to be because Monday, they’re recording a sausage commercial, Tuesday, they’re selling cars, Wednesday there in the studio recording our action on Google. They need to know what character they’re being.

Douglas Nicol:                  Wally, when you talk about the interaction with the AI and say Google Assistant, how should we behave towards Google Assistant? At its core, really what I’m asking, which I think a lot of people wonder about, should we say please and thank you. If we do, will the assistant be nicer to us over time? Do you remember that stuff? Will you think poorly of us if we say something nasty?

Wally Brill:                         It’s very funny, I was just talking the other day to a group of people who are asking me questions like this. I went, “Great question. I realized that Americans always say that. Anytime somebody asks a question, it’s like hyper politeness. “That’s a great question. Yeah. You know, I was thinking about this, Nick.”

I think politeness is important. We’ve done some experiments around it. Do I think it’s necessary? Well, when I walk into the house, I just say, “Okay, Google living room lights on.” I don’t go, “Please, could you turn on the lights in the living room Google Assistant.” It’s unnecessary. I’m talking to an appliance. I believe that I maintain … Yes, I use the conventions of speech, but it’s an appliance, it’s a tool. “Okay, hammer, I’m going to hit the nail now. I hope you don’t hurt too much.”

Douglas Nicol:                  To take that perhaps just one step further and we were lucky at South by Southwest to interview Professor Ishiguro who is a Japanese robotics professor best known for creating a robot in his own image. But he was confident that he’s very close with his bot [inaudible 00:15:33] passing a Turing test. The idea for those listening, a Turing test is where it is not possible for a human to differentiate as to whether they’re listening to a bot or a human. Does life get different then, and I guess how close are we to passing a Turing test, and what happens if we go beyond that threshold?

Wally Brill:                         That’s a great question, man. Very polite. I love interviewing people from America, very polite.

Nick Abrahams:               Was my question better than Douglas’? I feel like Douglas’ is better.

Wally Brill:                         I think it might have been, actually. Douglas, what are you going to pay me? As far as the Turing test is concerned, I wonder about when the AI knows more about me and my life than I do, and the AI understands what my next best action is. Already, auto fill in email, things like that.

I started to wonder, will I feel less than? Personally, will my ego suffer because my robot makes better choices than I do and isn’t swayed by mood, and it’s more pragmatic than I am? In terms of the Turing test, I’m not sure where it leads us. People talk about the singularity. I used to worry about this. I used to stay up nights worrying about the singularity. I don’t worry about it anymore.

The more I understand about this technology, the more I understand that these are brilliantly designed tools. They will be able to self-improve, which is wonderful, but they’re not going to take over the world. It’s not Skynet, it’s going to be fine. Don’t worry people, it’s going to be fine.

Nick Abrahams:               You heard it here.

Douglas Nicol:                  We have a lot of clients and organizations who listen to this podcast, and they’re probably going, look, I know voice interactions and voice based experience are really important, but they don’t know quite where to start. If they have a brand, what are the ideas and concepts that they should focus on, and what are the things they shouldn’t focus on? Do you have any guidance for people who are just starting in the early stages of experimenting with maybe interacting on a smart speaker for example?

Wally Brill:                         Start small, start really small. Keep it narrow. Something that’s interesting that a lot of brands are doing is they’re putting their toe in the water with things that are tangential. They’re not just trying to sell product, for example. There’s a company that sells antihistamines, they created an action that tells you what the pollen count is where you live. You give it a zip code, it’ll tell you how bad the pollen is. It’s useful, it’s not selling any Zyrtec particularly, but it’s useful.

Halifax Building Society in Great Britain has a thing called the Jargon Buster so you can say to it, “Okay, Halifax, what’s [inaudible 00:18:47] It’ll tell you, you’ve read the contract, you’re about to buy the house and somebody comes in and offers $100,000 more than you did and steals it from you. It’s useful. It’s interesting. It relates tangentially to their core business.

Douglas Nicol:                  I’m hearing this, don’t think of this as the new website, think of this as almost like a service and support channel for people’s lives and the brand will get a halo effect from that.

Wally Brill:                         Initially, I think as we go forward, and people get more experience and there’s more traffic and it’s more used, I think it’s going to grow exponentially. I think you will perform all kinds of functions through these actions. It’s just, if you start too big, then you’re biting off an awful lot to begin with. So, it’s good to start with something narrow, a good task that will serve people’s purposes. United Airlines has flight information. When is my flight due? Is it on time? Is it late, what’s the story? Great application, very specific, very simple.

Douglas Nicol:                  I can’t help feeling one of the discussions for a lot of clients, that have got a set of brand guidelines and suddenly those brand guidelines exist as a voice, as a persona. That probably means that to an extent, you have to rewrite your brand guidelines because you literally are dealing with a personification of the client’s brand, and things like, what is the gender? Do they make jokes or are they very subservient? How do you navigate those kinds of decisions around gender and humor, and what should be in and out of the persona?

Wally Brill:                         Well, first off, I don’t think they change or rewrite their brand guidelines. I think the brand guidelines inform what the persona is. I think we maintain the brand as the brand. It’s in some ways sacrosanct. When we start talking about gender, or affect or sense of humor, these are all things that have to be user tested for brand fit. I don’t think there’s any hard and fast rule anywhere. I would dearly like to see brands go more extreme in terms of the characters, they’re building and designing. I think there’s room for that.

But again, voice designers always, in their off moments, let’s go out and have a drink and talk about voice design. There’s a thrill. It’s better than go out and have a drink and talk about IVR design. Everybody says at some point or another, “Gee, I wish we could make a cynical interface. Something that was just a little sassy or a little bit more, okay, I’ll do it if I have to.” Things like that. We’re not going to get there soon, but it would be fun to do.

Nick Abrahams:               It’s a danger, we have a sea of very blunt, vanilla, boring people to deal with on smart speakers.

Wally Brill:                         It’s true. But, again, what’s the balance? It’s an appliance. It’s functional. In some situations, that’s all you really need. You just need functional, do the thing, set the timer. Do this, do that. When you get into more esoteric stuff or more involved stuff, pedagogic interfaces or something, you might want something a little different. It just depends.

Douglas Nicol:                  Just in the emotional level of how people are responding and so forth, there’s been talk in the media and now about the ability to do emotional scanning off the back of voice. Can you talk a little bit about what’s potentially available either now or in the future around the market?

Wally Brill:                         Unfortunately, no.

Douglas Nicol:                  Okay. We will move on from.

Wally Brill:                         Great question.

Nick Abrahams:               So, Wally, as we move into a world where we have voice interaction, what are the jobs and careers of the future? There must be a whole bunch of new job titles and skills that are in demand in order to embrace this new world. People who are listening who want to skill up and extend their skills, what would you recommend they focus on to really get into this world and understand it?

Wally Brill:                         Linguistics, AI obviously, human computer interaction, small bookstore ownership … No. I think anything around design, anything around user interface, I think usability is vitally important and user research is vitally important, human factors, all of those sort of areas are just booming. You can’t find a voice user interface designer who isn’t fully employed at the moment. If you can, they don’t have a telephone or something. It’s really difficult to find one. Certainly, there’s levels.

Douglas Nicol:                  But in terms of skills, Nick is actually moonlights as a standup comedian. Do you think comedy writing is a skill? Do you think Nick could morph effortlessly into the world of voice design?

Wally Brill:                         I got to tell you, at Google, we have people from the world of comedy. Our Australian personality writer is a standup comic, Scott Dooley, he’s brilliant. He’s the one who puts the jokes in the Google Assistant for Australia. We have a team of people who come from places like Pixar or various TV communities, film communities, they know how people speak. Yeah, I think Nick, you got a great career ahead of you.

Nick Abrahams:               I am going to give you a cold, Wally, about that. It’s interesting just to pick up on the joke theme, which is because obviously that’s one of the questions that people ask an assistant a lot. What’s the background behind that, what goes on at Google when you think about that? Is it just jokes when people are asking for jokes, or is there an intention to create some levity in otherwise pretty normal interactions?

Wally Brill:                         A little levity goes a long way. Yeah. Again, when we’re thinking about interacting with an appliance to get something done, you don’t really want too much diversion. We talk about delight where it’s appropriate. It’s something you have to feel and you have to have a sense of. I would be really annoyed if my Google Assistant was super perky in the morning and telling me jokes and stuff. I’d throw the thing across the room.

But in certain places, it’s totally appropriate. But most of the time, you’re querying for something that’s going to provide humor.

Nick Abrahams:               One of the questions … There’s an organization that I’ve been working with, which was replacing 100 call center operators with effectively three conversation engineers and so forth. People often say, “Well, what are the jobs of the future from the sorts of jobs that you’ve talked about? One of the core skills is around creativity and being comfortable with creativity? Can you talk about how, I guess Google works with creativity, because it’s an enormously innovative organization? How do you cope with all that creativity, making it positive?

Wally Brill:                         When you say cope with it, I think it’s essential when you’re talking about language, when you’re talking about dialogue of any sort. Linguists are innately creative. One of the other things we find is that an awful lot of musicians actually work in this industry as designers. I don’t know whether it’s because they hear the porosity or musicality of speech, and it makes it simpler for them. But there’s an awful lot of people from the arts who actually moved into this sort of world. Yeah, I did. I came from the music industry. I think creativity is essential, in designing these things.

Douglas Nicol:                  Wally, thank you so much for joining us today on Smart Dust. I found very useful your Google lightning talk, which I found on YouTube, which you’ve recently posted. It really gets into the detail of persona design, and gives some really good guidance. That might be a good resource for our listeners. Any other resources you’d recommend in terms of further reading on this topic?

Wally Brill:                         Yeah, there’s a wonderful book called Designing Voice User Interfaces by Dr. Cathy Pearl, who is my opposite number at Google. She’s the head of conversation, design outreach. She’s written this marvelous book, and its become really quite important in the industry. We have a website, which is actions.google.com/design. Once again, that’s actions.google.com/design, which is our base repository for all the information people would need to know to be able to design effectively. I highly recommend it to people, actions.google.com/design.

Nick Abrahams:               You did used to design those IVRs didn’t you? That voice sounds familiar.

Wally Brill:                         I used to have a job in radio on a pirate ship in the middle of the North Atlantic Ocean, but that’s another whole podcast.

Douglas Nicol:                  You were involved in the Pirate Radio. Oh, well, that is another podcast. The podcast after that, hopefully will be when we can do an interview with Google Assistant.

Wally Brill:                         You can do it now. She’ll talk to you.

Douglas Nicol:                  Okay.

Wally Brill:                         You’ll talk. How much worse-

Douglas Nicol:                  That could be the very next one. Thank you very much, Wally.

Wally Brill:                         Thank you, guys. Thanks for the great questions.

Nick Abrahams:               Very nice. Thank you.

 

Book Nick To Speak

At your next event

Shake Your Audience With Nick's Digital Distruption
Speaking Topics