Our AI Interaction Designer Diane Kim explores the crucial differences between chatbots and conversational agents
Our scheduling assistants Amy + Andrew over here at x.ai are no strangers to being referred to as chatbots. It’s easy to understand why: any conversational interface with a machine often gets labeled as a “chatbot,” but in Amy + Andrew’s case, it’s a bit of a miscategorization.
To caveat—this isn’t a takedown of chatbots, which serve a purpose and are tremendously helpful in the right contexts. But while the experience of interacting with Amy + Andrew may be similar to a chatbot, the design and engineering of our AI scheduling assistants is much more complex.
Some of the backend differences between chatbots and AI agents can actually affect how users interact with them, so I wanted to share more about the characteristics that distinguish between the two. In the simplest terms, a chatbot is a chatbot because it only delivers on one task at a time from a single conversation from a single user.
Conversely, x.ai’s AI scheduling assistants Amy + Andrew are goal-oriented agents that juggle multiple streams of input and conversations with more than one person in order to complete a complex task—all simultaneously. Here’s a simplified side-by-side comparison:
Single tasks vs multiple tasks
Most chatbots are able to handle single, independent tasks one at a time:
“Alexa, what’s the weather like today?”
“The weather in New York today is in the low 80s, sunny all day.”
In with a question, out with an answer. One to one. Chatbots handle tasks than can be accomplished with a single transaction (a single Google search, setting a single timer). They’re incredibly useful in the right context, when all you need is a quick and easy task done right.
However, that sort of one-off interaction is very different from what you’d expect of a goal-oriented agent that can, say, book travel plans or schedule a coffee for you.
For example, at any given moment in the scheduling process, Amy + Andrew are handling multiple tasks at once:
Confirming a time: Sending emails and back and forth with guests to find an available time that works for everyone and at the same time, continuously checking the customer’s calendar to make sure the times are still available and the customer hasn’t added new events.
Confirming a location: Sending emails back and forth to request the meeting location and confirm the details (an address, a phone number, a conference line), and
Confirming participants: Adding or removing participants from the meeting negotiation.
If handling these tasks simultaneously isn’t complicated enough, the tasks also influence one another to affect how each task is handled and the eventual outcome.
For example: Let’s say that a majority of a meeting’s guests have OK’d a time, 4 PM. But then, a guest emails Amy asking to change the meeting’s location (“Can we meet for coffee instead of having a phone call?”) Now, the host, Amy’s “boss,” might have different scheduling preferences for coffee than phone calls and only lets Amy schedule coffees from 9-11 AM. Amy will need to propose new times to all the guests and start the time negotiation process again.
Amy can handle a task with more than one step to reach a complex, shifting goal.
Single users vs multiple users
Interestingly, when I’ve looked for prototyping tools for dialogue/voice design, the only software that exists right now is limited to single person interactions.
So why are multi participant conversations so tricky?
For a machine to “talk” to more than one person at once necessitates massive work on dialogue design, which means that the biggest factor differentiating Amy + Andrew’s design from chatbots is that they actually exclusively handle multiple participants.
The more people Amy + Andrew engage in a conversation, the more opportunity exists for those participants to give input that affects the “state” of the meeting (what we call the dependent tasks.)
Even in the very simplest scenario, a 1:1 meeting, Amy + Andrew handle two conversation threads at once (Amy:you and Amy:your guest) . This multiparticipant interaction is more complex than any Facebook messenger bot or voice assistant I’ve interacted with today. Now imagine that you’re meeting with 3 external guests, and Amy now is in conversation with 4 people at the same time.
Because every new conversation thread exponentially increases the complexity of scheduling the meeting, even x.ai’s complex system still currently has to limit Amy + Andrew’s meeting negotiations to five participants max.
“Instant” responses vs. thorough multi participant negotiation
With a chatbot, a relatively instantaneous conversation occurs with at most a few seconds of delay. It’s partly because the tasks are often quick and simple, but also because of the interaction medium.
Chatbots tend to live in platforms where you have expectations for quick responses—Facebook Messenger, web-based chat (think Slack), and voice (on-the-spot, audible conversations). You would be annoyed with Alexa if she took 2 minutes to respond to your question, right?
But that’s not necessarily the case with Amy + Andrew, because:
- We’ve chosen email as our primary medium (a 2 minute email response is considered speedy!)
- Their tasks are a bit more complex than just setting a timer.
Of course, if you provide Amy with an exact time and instructions for a meeting and you simply need her to add it to the calendar, this happens very quickly (“Amy, add this to the calendar for tomorrow at 10:30 AM in the office.”) (Amy CAN do this in 2 min.)
But imagine if you ask Amy to “Schedule a coffee with Jesse next week. He can choose the location.”
Not only does Andrew need to process both time and location information simultaneously, but he also must relay you availability to your guest Jesse, wait for Jesse to respond, engage in ongoing time and location negotiation, and only then finally get back to you when all the meeting details are confirmed.
If you were curious why you may not always hear back from Amy or Andrew as “instantaneously” as you might from a chatbot, they could have still been confirming information with other users.
Singular source vs. multiple sources
Lastly, because most chatbot interactions happen in a relatively short period of time, they usually occur within a single medium or platform. For example, you might text your banking bot or talk to the same bank’s customer service bot on the bank’s website, but these conversations begin and end in the same medium.
Amy + Andrew handle multiple input channels: you can initiate meetings from Slack or email, and make changes to the meeting via email, my.x.ai (our web UI), or directly through the calendar invite. We’ve opened up so many channels of interaction because we want to make it easy to change your meetings from wherever you’re working. If it’s easier for you to just delete the meeting off your calendar, that’s totally fine—Amy + Andrew understand. If you’re already in your inbox and you just need to shoot them a quick email to cancel the meeting—that works too!
But supporting multiple platforms and interaction channels has its own design considerations (do they sound the same over Slack vs. email?), but the main challenge is technical: to seamlessly integrate these various inputs and continue to schedule the meeting as usual.
The design challenges at hand
When it comes to building any type of conversational product, designers are faced with a number of challenges:
- Design constraints of natural language processing—In 2018, we are still quite limited in scope of how accurately computers can understand natural language (note: not just understanding textual input, but the layers of social context weaved into human communication as well)
- Designing a cohesive personality and voice—we’ve written before about how we’ve humanized our agents by giving them their own clear personas.
In addition to the above baseline design considerations, conversational design for Amy and Andrew involves the additional layer of multi-user dialogue interaction design. Unlike a conversation with a chatbot, each incoming source of input Amy receives from User A, User B, or User C affects the decision-making for the next action she takes to accomplish the end goal.
Our unique challenge of getting a meeting scheduled via email is twofold:
- Amy is always in conversation with at least 2 users
- The outcomes of each task will influence the remaining tasks
Multiply this with the fact that our users communicate in an open dialogue medium and can send Amy any type of message (that they’ll do their best to understand!), and you can start to get a sense of the magnitude of complexity involved in the design and engineering of these types of dialogue systems.