Conversational AI with Agentive Technology
I read “Designing Agentive Technology”. It was interesting, so I’ll try to deepen my understanding of it with a project that I’m working on as an example.
This book describes what Agentive Technology is and how to design AI to work for people through it. The book was published in May 2017, so the examples are a bit old, but the essence of what is discussed in this book has not yet been realized. It will be very useful for future service design.
In LINE’s AI business, where I am currently working, the vision is to create a “more natural experience”. I believe that the ultimate natural experience is to provide users with the necessary information or take actions on their behalf without them being aware of its existence.
In this book, the technology to realize this is called Agentive Technology, and services with such an orientation are defined as Agentive, and the Agent is “AI that works for people”.
The service design of this “AI that works for people” is clarified with many examples. In this post, I will refer to the contents of this book and introduce LINE AiCall, a Conversational AI service for call centers developed by the LINE AI business, with examples.
What is Agentive?
Agentive services and functions are based on user understanding and perform some tasks for the user.
The entity that executes this task is the Agent, and the main roles of the Agent are as follows
- Setting Up
Users delegate specific tasks to Agents and are given the necessary permissions and settings to perform those tasks (e.g. access to user data for user understanding). - Doing (Monitoring & Notifications)
The task is executed by some trigger such as a fixed time, a change in the user’s state, or an action from the user. While the task is running, the Agent will monitor its own execution status, and the user will be able to know the status at any time if necessary. The user will be notified when the task is completed successfully, or when any problem occurs. - Handling Exceptions
Handle exceptions and use feedback to optimize Agent runtime behavior. - Handoff & Takeback
When an unexpected situation occurs and a task cannot be executed, the Agent will handoff the task on to someone who can execute it (a user or another operator); when the Agent is restored to a state where it can execute, it will return the responsibility for execution to the Agent.
A good Agent will keep the user’s “eyes” away while it is working well. Then, only when the user action is needed for some reason, the Agent will get the user’s attention. (Of course, there should be an interface for the user to control the Agent, such as stopping it temporarily, forcing it to change its behavior, etc.)
Agents do not merely automate the process from a defined input to an output (= Not Automation). On the other hand, they do not merely make it convenient for people to perform the task of recognizing the output, thinking about it, and generating the next input (= Not Assistant).
Rather, it is the one that performs those entire processes for the user in a given task (=Agent).
This is the overview.
From the next chapter, we will deep dive into LINE AiCall, a Conversational AI service for call centers developed by LINE’s AI business, one case study at a time.
Definition of Agent in Call Center Conversational AI
In the case of LINE AiCall, a Conversational AI service for call centers, the source of the task is the operator, and LINE AiCall is the Agent for the call center operator. It’s a composition of answering inquiries from customers on behalf of the operator.
Therefore, it is the operator himself or his company that manages the Agent. On the other hand, the person the Agent deals with will be the customer. In line with this structure, the following diagram can be drawn from 1~4 in the previous section.
It is a little complicated because there are two types of parties that an Agent is involved with: operators (companies) and customers, but I will look at them one by one.
1. Setting Up
In Conversational AI, the setup includes the design of a dialogue scenario to interact with the customer, the implementation of a series of dialogues including data access necessary for the interaction, and the initial configuration to run the dialogue scenario. Dialogue design itself is a huge topic in itself and cannot be described here, so I will skip it. (I would like to write about it another time)
However, there is one element that is also mentioned in this book and is also very important in Conversational AI that I would like to introduce. It is the design of conveying capabilities and limitations.
In LINE AiCall, the actual communication of capabilities and limitations will be in the first response message when a customer inquires. And in this very first message, you need to communicate the capabilities and limitations in a straightforward manner.
<Examples of capabilities and limitations in LINE AiCall
a) Capabilities: Restaurant reservations, package pickup requests, etc.
b) Limitations: Not being able to respond flexibly to all questions because it is not human
Naturally, if the explanation is redundant, the customer will get frustrated and close the conversation with the Agent. On the other hand, if those are not communicated correctly, the customer will be confused and disappointed by the difference in response from the human operator.
I think that the first interaction is so important that you should decide what you want to convey in terms of capabilities and limitations in this first message, and then implement the necessary and sufficient dialogue scenarios accordingly.
2. Doing (Monitoring & Notifications)
In the case of LINE AiCall, the trigger for execution is an inquiry (call) from the customer.
As long as the Agent is handling customer inquiries properly, the operator or the company from which the task is outsourced is essentially not paying attention to the Agent. However, on the other hand, the Agent should always be visible about its current behavior so that the operator can know what is going on if necessary.
It should be able to monitor the cumulative number of calls, the number of phone connections being answered at that moment, the real-time interaction between the customer and the Agent, and the status of any errors that have occurred, if necessary.
The Agent makes its status available to the user at any time, but also proactively informs the user about certain events. This can be the completion of a task, a suggestion for better response, an error of some kind, or an alert when approaching an error.
If the user absolutely trusts the Agent, then a minimum of the notification may be sufficient, without constant visibility. However, in order to build that trust, the state of the Agent must be transparent, and its administrator must be able to check on the Agent’s workings, and there must be a process of trust.
Even with LINE AiCall, there is often not much monitoring of the call status once it is trusted. However, in order to gain that trust, or to regain it once it has been lost due to problems, it is necessary to visualize its behavior as much as possible and communicate that it is working properly.
This is not limited to Conversational AI, is it? In other words, by visualizing and notifying the state of the Agent, it will be possible to properly understand the Agent’s behavior, which will ultimately lead to trust in the Agent.
3. 例外への対応
As a general rule, various exceptions occur in the course of performing the task entrusted to the Agent.
For example, one of the exceptions is when Roomba gets stuck under a sofa, and you can pull it out and put it in an operable position to automatically resume cleaning.
However, dealing with exceptions in Conversational AI comes down to two main perspectives.
3–1. Improvement for “Undesirable Behavior From the User’s Point of View”
To start with a simple example, let’s consider the example of Google Photos, which has the ability to automatically generate albums for you. For example, collecting photos from different years of the day. If the user is looking at the album, it is a good idea to continuously generate the album. On the other hand, if the user does not look at the album at all, it is necessary to reduce the frequency of generation or change the axis of the album, for example, collecting by the people in the photo, to find out the behavior that the user prefers.
In this example, it is a simple task to create an album for users to view their saved photos, so I think it is rather easy to implement a feedback loop that changes the behavior of album creation based on user reactions.
On the other hand, with Conversational AI, the situation is much more complex: in LINE AiCall, the dialogue between the Agent and the customer involves a lot of interaction, and what it hears and processes over the course of that conversation is also complex.
Exceptions to this include failure of speech recognition to recognize a particular item, failure to understand the intent of a particular utterance from a customer, and so on. Ideally, for all of these exceptions, there should be an automatic feedback loop, understand the cause, identify the components to improve the cause (speech recognition, intent understanding, etc.), resolve the error, validate the improved model, and deploy it safely.
However, in order to do this, ML Ops must be implemented, including proper data collection of user behavior, model improvement, and model deployment. (MLOps is a very important factor in the overall design of the Agent, but I’m not going to go into it because it’s a big topic in itself and I’m not confident enough to go into details at this point.)
3–2. Responding to “Hidden Requests”
In the Google Photos example above, it’s the case of that the user wants more albums to be generated automatically.
This is the most difficult part. The basic approach is to make suggestions. If you increase the number of auto-generated albums a little, as in the Google Photo example above, it may not give users too much discomfort. (If you increase the number of suggestions, and as a result, the number of times users look at them decreases, you can reduce the frequency, which makes it relatively easy to start the improvement cycle.)
In the case of dialogue, it is even more difficult. For example, when an Agent is talking to a customer on LINE AiCall, it would be annoying if he or she were to suddenly tell the customer that I can answer such questions. I think the right approach is to make suggestions based on user understanding.
As I mentioned in the post above, it is important to understand the user’s preferences, for example, when making a reservation at a restaurant, if you don’t smoke, you can suggest a non-smoking table from the beginning, etc. If the Agent only receives requests for smoking/nonsmoking seats, he only has to ask one, but if there are dozens of requests, he can’t ask them one by one.
Therefore, the solution to this problem will be proposed by the Agent based on the understanding of the user, even if the customer does not say so. However, this kind of proposal is only possible after a long relationship between the customer and the Agent. Therefore, it is very difficult and cannot be achieved immediately. On the other hand, when an appropriate proposal can be made, the sense of trust will increase at once.
This is how to deal with exceptions. However, the contents described in this chapter are exceptions that have been anticipated in advance. On the other hand, in reality, there are many cases (very many in fact) that are not anticipated in advance. How to prepare for such unexpected cases is described in the next chapter.
4. Handoff & Takeback
When an unexpected error occurs, it becomes difficult for the Agent to continue with the task it has been assigned. In that case, the control will be transferred to the user or delegated again to the operator or another Agent.
In the case of LINE AiCall, it is normal to connect to an operator; in the case of requests that LINE AiCall cannot answer or cannot respond to, it will definitely connect to an operator. In the case of Conversational AI applied as a business, this reliable handoff, or transfer to the operator, is the most important factor.
For this reason, it is important to inform the customer that there is a means of handoff to an operator and how exactly to do it. For example, if a customer makes a request that the Agent cannot answer, the Agent should respond “Would you like me to connect you to an operator?”, like that.
Providing a handoff means is an essential element that does not stop the execution of the task, at least not while the Agent itself cannot complete all the requirements of the task.
For example, even if LINE AiCall is able to handle 90% of the call center’s calls, if the remaining 10% is handled without handover, customer satisfaction will drop dramatically. Instead, even if LINE AiCall is able to handle 80% of the calls, the total satisfaction level will be much higher if the remaining 20% can be transferred to the operator without fail.
Also, even in a handoff situation, once the irregularity is completed, it would be ideal to be able to take back to the Agent again without degrading the user experience.
Unfortunately, take back has not yet been realized even with LINE AiCall.
If it is just a matter of switching from Operator to Agent again, defining which state of the dialogue scenario to take back seems to be technically fine. However, from the customer’s point of view, a simple switch from Operator to Agent (automatic response) will greatly impair the UX.
If, from the customer’s point of view, the Agent and the operator can be seamlessly connected without making the customer aware of the handover to the operator, then take back can be realized without compromising the UX.
In that case, it is not a handover from the Agent to the operator, but it’s like that the operator manually operating the Agent (talking to the user through the Agent) instead of the software controlling the Agent. It is an image where the Agent acts as a hybrid of software and human operator as if it were a digital worker.
I believe that if we can achieve take-back without causing discomfort to customers, we can minimize operator work and maximize cost reduction effects.
In the End
I have broken down Designing Agentive Technology into four categories, using LINE AiCall, a Conversational AI service for call centers, as a case study.
At the time the book was published, the Agent described in the book could have been realized only in a narrow range of use cases.
However, with the practical application of Big Model, which LINE is also working on, and the development of other peripheral technologies, we believe that the design concept of Agents, which can perform tasks on behalf of the user through interaction with the user, will expand its scope of application.
The information in this post is related to LINE AiCall, a Conversational AI for call centers, and is only a small part of this book. There are many more examples and topics in this book, so if you are interested in Agentive Technology, please read on.