Back to blog
Articles
Articles
February 3, 2022
·
4 MIN READ

Continuous AI Improvement

February 3, 2022
|
4 MIN READ

Latest content

Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Announcements
2 min read

Full Circle: HumanFirst Welcomes Maeghan Smulders as COO

Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
February 13, 2024
Tutorials
4 min read

Accelerating Data Analysis with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to accelerate data analysis.
January 24, 2024
Tutorials
4 min read

Exploring Contact Center Data with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to streamline topic modeling.
January 11, 2024
Articles
5 min

Building In Alignment: The Role of Observability in LLM-Led Conversational Design

Building In Alignment: The Role of Observability in LLM-Led Conversational Design
December 6, 2023
Articles
5 min read

Rivet Is An Open-Source Visual AI Programming Environment

Rivet is suited for building complex agents with LLM Prompts, and it was Open Sourced recently.
September 27, 2023
Articles
6 min read

What Is The Future Of Prompt Engineering?

The skill of Prompt Engineering has been touted as the ultimate skill of the future. But, will prompt engineering be around in the near future? In this article I attempt to decompose how the future LLM interface might look like…considering it will be conversational.
September 26, 2023
Articles
4 min read

LLM Drift

A recent study coined the term LLM Drift. LLM Drift is definite changes in LLM responses and behaviour, over a relatively short period of time.
September 25, 2023
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Announcements
2 min read

Full Circle: HumanFirst Welcomes Maeghan Smulders as COO

Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
February 13, 2024

Let your data drive.

With the emergence of platforms democratizing conversational AI, there’s a smaller hurdle to get a basic NLU model up and running. The real work starts when your conversational AI project is in production and conversations you would have never expected are thrown at you.

Deploying your chatbot is similar to being thrown in the pool without floaties for the first time. It happens suddenly, you’re no longer in a controlled environment, and you’ll find out the hard way whether your training was adequate enough to stay afloat.

The only way to deliver good conversational experiences in an unforeseen environment is to adapt to it.

Why It’s Important To Have Continuous Improvement:

  • It’s impossible to predict every eventuality. NLU models need to be continuously learning.
  • There’s a need to respond to real-world changes, such as a new product or new company policies.
  • Humans are dynamic, and there will always be some spontaneity in customer queries.
  • Conversational AI  projects learn by doing. Incoming data should be reflected in your model.

You’ll be amazed at the unanticipated utterances you’ll receive in production. However, using state-of-the-art tools like HumanFirst can ease your anxiety about the unanticipated nature of conversations, and will make the process of continuous improvement consequential, data-driven, and streamlined.

How to Approach Continuous Improvement


Real-Life Conversation Data

Using your unlabeled data to discover new intents and optimizing existing ones is the path to 100% accuracy and coverage. As mentioned above, chatbots learn by doing. There’s no better way to learn how to react in real-world scenarios than by using real-life conversation data generated in production.

That’s why HumanFirst is so powerful; your post-deployment data can be automatically piped in from your Conversational AI platform at the cadence of your choice. This allows you to leverage your unlabeled data to improve your labeled data. You can sort and explore your unlabeled data based on uncertainty, margin score, and entropy of your trained model, identifying utterances that are most likely to represent new or related intents.

Data-Driven Improvement

Your model is bound to underperform when improvements aren’t driven by data. For example, when annotators come across unlabeled utterances, their first hunch might be to add them to the intents they think they belong in. But, if every decision was based on a hunch (whether that’s adding training phrases, disambiguating, splitting, merging intents, or other) it will take copious amounts of trial and error before achieving a well-performing conversational experience.

This specific example problem is mitigated by HumanFirsts’ real-time recommendations; when adding new utterances from your unlabeled data to existing intents, users are prompted to add it to the intent with the highest confidence of matching (provided in real-time). On a general level, all of HumanFirsts’ workflows around continuous improvement are similarly model-driven with real-time feedback.

Streamlined Continuous Improvement

With so much incoming data, so many problematic intents, and so little organization, how does one know where to begin? Having streamlined, consistent improvement workflows is important if you want to see significant enhancements to your models.

HumanFirst generates on-demand 5-fold cross-validation analysis against its NLU (or your own), to provide intent-level metrics (F1, precision, recall, accuracy) that can be used to understand and tune your model. This will help you:

  • Identify which intents need additional training examples
  • Identify which intents have high confusion (typically intents with a mix of very different training examples)
  • Identify which training examples belong to another intent in your corpus
  • Re-label problematic training examples

HumanFirst provides machine-learning-assisted workflows that help fix your model's data. It’s easy to merge conflicting intents and their training phrases together with a single click, or quickly move problematic utterances from one intent to the other.

Their built-in disambiguation feature (both real-time, and based on your trained model) allows you to quickly view intents that are conflicting, providing actionable workflows to ensure each intent's scope is as clean and specific as possible.

Disambiguation Workflow

At the end of the day, we’re more likely to make consistent changes when there is less friction between us and the problem. Finding the easiest solution to incrementally improve the model on a continuous basis is key. Like the stock market, you’re in it for the long game (steady increases), as opposed to the volatile cryptocurrency market (which includes massive changes, massive upticks, and massive crashes).

As Alex Halper says, deploying Conversational AI is a journey, not a milestone. That statement can seem overwhelming and daunting without streamlined processes. Luckily, state-of-the-art approaches like HumanFirst are there to help you in this process.

HumanFirst is like Excel, for Natural Language Data. A complete productivity suite to transform natural language into business insights and AI training data.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox