With the emergence of platforms democratizing conversational AI, there’s a smaller hurdle to get a basic NLU model up and running. The real work starts when your conversational AI project is in production and conversations you would have never expected are thrown at you.

Deploying your chatbot is similar to being thrown in the pool without floaties for the first time. It happens suddenly, you’re no longer in a controlled environment, and you’ll find out the hard way whether your training was adequate enough to stay afloat.

The only way to deliver good conversational experiences in an unforeseen environment is to adapt to it.

Why It’s Important To Have Continuous Improvement:

  • It’s impossible to predict every eventuality. NLU models need to be continuously learning.
  • There’s a need to respond to real-world changes, such as a new product or new company policies.
  • Humans are dynamic, and there will always be some spontaneity in customer queries.
  • Conversational AI  projects learn by doing. Incoming data should be reflected in your model.

You’ll be amazed at the unanticipated utterances you’ll receive in production. However, using state-of-the-art tools like HumanFirst can ease your anxiety about the unanticipated nature of conversations, and will make the process of continuous improvement consequential, data-driven, and streamlined.

How to Approach Continuous Improvement

Real-Life Conversation Data

Using your unlabeled data to discover new intents and optimizing existing ones is the path to 100% accuracy and coverage. As mentioned above, chatbots learn by doing. There’s no better way to learn how to react in real-world scenarios than by using real-life conversation data generated in production.

That’s why HumanFirst is so powerful; your post-deployment data can be automatically piped in from your Conversational AI platform at the cadence of your choice. This allows you to leverage your unlabeled data to improve your labeled data. You can sort and explore your unlabeled data based on uncertainty, margin score, and entropy of your trained model, identifying utterances that are most likely to represent new or related intents.

Data-Driven Improvement

Your model is bound to underperform when improvements aren’t driven by data. For example, when annotators come across unlabeled utterances, their first hunch might be to add them to the intents they think they belong in. But, if every decision was based on a hunch (whether that’s adding training phrases, disambiguating, splitting, merging intents, or other) it will take copious amounts of trial and error before achieving a well-performing conversational experience.

This specific example problem is mitigated by HumanFirsts’ real-time recommendations; when adding new utterances from your unlabeled data to existing intents, users are prompted to add it to the intent with the highest confidence of matching (provided in real-time). On a general level, all of HumanFirsts’ workflows around continuous improvement are similarly model-driven with real-time feedback.

Streamlined Continuous Improvement

With so much incoming data, so many problematic intents, and so little organization, how does one know where to begin? Having streamlined, consistent improvement workflows is important if you want to see significant enhancements to your models.

HumanFirst generates on-demand 5-fold cross-validation analysis against its NLU (or your own), to provide intent-level metrics (F1, precision, recall, accuracy) that can be used to understand and tune your model. This will help you:

  • Identify which intents need additional training examples
  • Identify which intents have high confusion (typically intents with a mix of very different training examples)
  • Identify which training examples belong to another intent in your corpus
  • Re-label problematic training examples

HumanFirst provides machine-learning-assisted workflows that help fix your model's data. It’s easy to merge conflicting intents and their training phrases together with a single click, or quickly move problematic utterances from one intent to the other.

Their built-in disambiguation feature (both real-time, and based on your trained model) allows you to quickly view intents that are conflicting, providing actionable workflows to ensure each intent's scope is as clean and specific as possible.

Disambiguation Workflow

At the end of the day, we’re more likely to make consistent changes when there is less friction between us and the problem. Finding the easiest solution to incrementally improve the model on a continuous basis is key. Like the stock market, you’re in it for the long game (steady increases), as opposed to the volatile cryptocurrency market (which includes massive changes, massive upticks, and massive crashes).

As Alex Halper says, deploying Conversational AI is a journey, not a milestone. That statement can seem overwhelming and daunting without streamlined processes. Luckily, state-of-the-art approaches like HumanFirst are there to help you in this process.

HumanFirst is like Excel, for Natural Language Data. A complete productivity suite to transform natural language into business insights and AI training data.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox