Back to blog
Articles
Tutorials
February 29, 2024
·
6 min read

Generating Chatbot Flow Logic from Real Conversations

February 29, 2024
|
6 min read

Generative AI is fundamentally changing conversation design, but deploying an AI-enabled chatbot still requires a human-led process of hardcoding flows for different conversational scenarios. We’re not yet at the point of leaving the logic of complex interactions to the LLM; it’s unclear if or when we’ll arrive.

A roadblock to fast and performant chatbot design is the number of side roads and strange turns inherent to human conversations. Humans can drive those interactions without thinking, but it’s time consuming, arduous, and challenging for us to chart the path with the detail required. 

Luckily, companies already have wide libraries of successful customer conversations from call transcripts, bot logs, email support tickets, and other sources. With the right human-in-the-loop workflow, those examples should provide the LLM what it needs to deduce the logic and design the chatbot flow. This article outlines that process, beginning with specific subsets of data and ending with a flow that could feed a generative playbook, a text to flow converter, or a chatbot platform that can build flows from natural language.

Validating the Concept: Single-Use Case Testing

To begin, we need to validate whether our goal is feasible for a single kind of customer request. In this example, we’ll test the workflow on conversations that flag a missing package.

We can search through the sea of transcripts for missing packages in one of two ways. First, we can employ semantic search, a data engineering tactic underlying RAG solutions, which will search our transcripts for similar phrases. If we add a custom example to the stash–”I didn’t receive my package yet”--we can automatically surface semantically similar requests. We’ll select a handful conversations deemed similar and moved them to the stash. 

This test data determines the quality of our experiment–we want to make sure it’s accurate and specific. Searching by semantic similarity alone, we risk including conversations that mention a missing package but regard a different issue entirely. Rather than read through every transcript individually, we can build a custom prompt to find the true key issues of the conversations we’ve selected. 

We can run this prompt on the ten conversations in the stash to see the summarized key issue for each one. In this example, only a segment of the ten conversations have ‘package missing’ as the key issue. We can move those conversations to a new stash and proceed to the next step: engineering a chatbot flow prompt. 

Generating Chatbot Flows from a Small Data Segment

Having filtered our data through semantic search and a custom summarization prompt, we can start working on the prompt that will generate the chatbot logic. This is where iterative, human-in-the-loop prompt engineering takes over. After many revisions, we’ve engineered a prompt with sufficient detail to create the flow we need–full of conditional logic, subnodes, specific omissions, and instructions a chatbot could follow. 

Running this prompt on a merged stash will take all of the selected conversations into account, compiling their cues to create a single flow. This is the beginning of many revisions–we’ll want to ensure the output is properly formatted and that there are no oversights across these six examples. 

Expanding the Logic

We’re now confident that we’ve generated a chatbot flow logic that could sufficiently manage these example conversations. But no doubt there are other conversations regarding missing packages that take different turns. 

To find more conversations and test this flow against edge cases, we can return to our ‘key issue’ prompt. Using the pipeline feature, we can summarize the key issue across all raw transcripts simultaneously. We can cluster the output by similarity, search for the keyword ‘not delivered,’ curate the results, and move them to a new stash. Working with simplified summaries, we can now effectively search by semantic similarity to find more conversations on the same topic. In this example, we found 15 additional conversations about a missing package.

Let’s create a custom prompt to test our generated flow against these new scenarios. We can copy and paste the flow from the previous step and ask the model whether the given instructions would sufficiently handle these new conversations. In the case of no, we’ll ask it to list the missing steps. We can run this on each individual conversation in the stash to see a yes or no classification for all 15 examples. 

Following is that prompt, slightly simplified:

Our output highlights important gaps in the flow; a step for validating the user’s account information, a provision for the case that order details are unknown. These are just a few examples of overlooked steps we’ll want to account for. 

We can continue to fine-tune this flow until we get nothing but ‘yes’ from the above validation prompt. Then we can transfer the output to a generative playbook, a text to flow converter, or a chatbot platform that allows you to express flows in natural language. 

Finding and accounting for edge cases is easier, faster, and more effective when we work from real customer conversations. Synthetic data and imagined scenarios are limited by our conscious understanding of customer interactions, often lacking the nuances of natural language in action. With this workflow, conversation designers can accelerate the backend build of flexible and intuitive chatbots, improving agility and dramatically accelerating time to deployment.

Latest content

Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Announcements
2 min read

Full Circle: HumanFirst Welcomes Maeghan Smulders as COO

Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
February 13, 2024
Tutorials
4 min read

Accelerating Data Analysis with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to accelerate data analysis.
January 24, 2024
Tutorials
4 min read

Exploring Contact Center Data with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to streamline topic modeling.
January 11, 2024
Articles
5 min

Building In Alignment: The Role of Observability in LLM-Led Conversational Design

Building In Alignment: The Role of Observability in LLM-Led Conversational Design
December 6, 2023
Articles
5 min read

Rivet Is An Open-Source Visual AI Programming Environment

Rivet is suited for building complex agents with LLM Prompts, and it was Open Sourced recently.
September 27, 2023
Articles
6 min read

What Is The Future Of Prompt Engineering?

The skill of Prompt Engineering has been touted as the ultimate skill of the future. But, will prompt engineering be around in the near future? In this article I attempt to decompose how the future LLM interface might look like…considering it will be conversational.
September 26, 2023
Articles
4 min read

LLM Drift

A recent study coined the term LLM Drift. LLM Drift is definite changes in LLM responses and behaviour, over a relatively short period of time.
September 25, 2023
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Announcements
2 min read

Full Circle: HumanFirst Welcomes Maeghan Smulders as COO

Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
February 13, 2024

Let your data drive.

Tutorials

Generating Chatbot Flow Logic from Real Conversations

GREGORY WHITESIDE
February 29, 2024
.
6 min read

How to build flexible, intuitive Conversational AI from unstructured customer data.

Generative AI is fundamentally changing conversation design, but deploying an AI-enabled chatbot still requires a human-led process of hardcoding flows for different conversational scenarios. We’re not yet at the point of leaving the logic of complex interactions to the LLM; it’s unclear if or when we’ll arrive.

A roadblock to fast and performant chatbot design is the number of side roads and strange turns inherent to human conversations. Humans can drive those interactions without thinking, but it’s time consuming, arduous, and challenging for us to chart the path with the detail required. 

Luckily, companies already have wide libraries of successful customer conversations from call transcripts, bot logs, email support tickets, and other sources. With the right human-in-the-loop workflow, those examples should provide the LLM what it needs to deduce the logic and design the chatbot flow. This article outlines that process, beginning with specific subsets of data and ending with a flow that could feed a generative playbook, a text to flow converter, or a chatbot platform that can build flows from natural language.

Validating the Concept: Single-Use Case Testing

To begin, we need to validate whether our goal is feasible for a single kind of customer request. In this example, we’ll test the workflow on conversations that flag a missing package.

We can search through the sea of transcripts for missing packages in one of two ways. First, we can employ semantic search, a data engineering tactic underlying RAG solutions, which will search our transcripts for similar phrases. If we add a custom example to the stash–”I didn’t receive my package yet”--we can automatically surface semantically similar requests. We’ll select a handful conversations deemed similar and moved them to the stash. 

This test data determines the quality of our experiment–we want to make sure it’s accurate and specific. Searching by semantic similarity alone, we risk including conversations that mention a missing package but regard a different issue entirely. Rather than read through every transcript individually, we can build a custom prompt to find the true key issues of the conversations we’ve selected. 

We can run this prompt on the ten conversations in the stash to see the summarized key issue for each one. In this example, only a segment of the ten conversations have ‘package missing’ as the key issue. We can move those conversations to a new stash and proceed to the next step: engineering a chatbot flow prompt. 

Generating Chatbot Flows from a Small Data Segment

Having filtered our data through semantic search and a custom summarization prompt, we can start working on the prompt that will generate the chatbot logic. This is where iterative, human-in-the-loop prompt engineering takes over. After many revisions, we’ve engineered a prompt with sufficient detail to create the flow we need–full of conditional logic, subnodes, specific omissions, and instructions a chatbot could follow. 

Running this prompt on a merged stash will take all of the selected conversations into account, compiling their cues to create a single flow. This is the beginning of many revisions–we’ll want to ensure the output is properly formatted and that there are no oversights across these six examples. 

Expanding the Logic

We’re now confident that we’ve generated a chatbot flow logic that could sufficiently manage these example conversations. But no doubt there are other conversations regarding missing packages that take different turns. 

To find more conversations and test this flow against edge cases, we can return to our ‘key issue’ prompt. Using the pipeline feature, we can summarize the key issue across all raw transcripts simultaneously. We can cluster the output by similarity, search for the keyword ‘not delivered,’ curate the results, and move them to a new stash. Working with simplified summaries, we can now effectively search by semantic similarity to find more conversations on the same topic. In this example, we found 15 additional conversations about a missing package.

Let’s create a custom prompt to test our generated flow against these new scenarios. We can copy and paste the flow from the previous step and ask the model whether the given instructions would sufficiently handle these new conversations. In the case of no, we’ll ask it to list the missing steps. We can run this on each individual conversation in the stash to see a yes or no classification for all 15 examples. 

Following is that prompt, slightly simplified:

Our output highlights important gaps in the flow; a step for validating the user’s account information, a provision for the case that order details are unknown. These are just a few examples of overlooked steps we’ll want to account for. 

We can continue to fine-tune this flow until we get nothing but ‘yes’ from the above validation prompt. Then we can transfer the output to a generative playbook, a text to flow converter, or a chatbot platform that allows you to express flows in natural language. 

Finding and accounting for edge cases is easier, faster, and more effective when we work from real customer conversations. Synthetic data and imagined scenarios are limited by our conscious understanding of customer interactions, often lacking the nuances of natural language in action. With this workflow, conversation designers can accelerate the backend build of flexible and intuitive chatbots, improving agility and dramatically accelerating time to deployment.

HumanFirst is a data-centric productivity platform designed to help companies find and solve problems with AI-powered workflows that combine prompt and data engineering. Experiment with raw data, surface insights, and build reliable solutions with speed, accuracy, and trust.

HumanFirst is a data-centric productivity platform designed to help companies find and solve problems with AI-powered workflows that combine prompt and data engineering. Experiment with raw data, surface insights, and build reliable solutions with speed, accuracy, and trust.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox