HumanFirst Studio was built in order to manage and continuously improve the training data of large conversational assistants, identifying valuable training data from existing sources that are often available but
HumanFirst Studio was built in order to manage and continuously improve the training data of large conversational assistants, identifying valuable training data from existing sources that are often available but hard to tap into without proper tooling.
In this article we’ll see how to use some freely available datasets in order to create a Rasa bot from scratch without having to come up with every single training phrase. We’ll also see how to use our command line tool, hf, in order to seamlessly integrate Rasa with HumanFirst Studio in a git-oriented workflow.
Note: This tutorial is written with rasa v2, but our tooling automatically detects the right format and works just as well with rasa v1
You will need a HumanFirst Studio account in order to go through this tutorial, you can create a free account here to get started.
Skip this step if you already have it. You can use this guide with both v1 and v2.
More complete instructions are available at rasa.com if necessary.
Install the HumanFirst CLI tool
Download one of our precompiled binaries at: https://github.com/zia-ai/humanfirst/releases/tag/cli-0.0.4
Choose the binary for your operating system.
For linux do:
You can then login to your studio account from the command line:
You should then see something like this, indicating you have logged in properly.
Starting a new Rasa project
If you’ve ever worked with Rasa, you probably have done these steps countless times. Once configured, HumanFirst Studio will accelerate these steps.
Create a new Rasa project from the command line.
Create a git repository (if you don’t have git you can skip this step):
Importing your new Rasa project into Studio
Create a workspace in which you’ll import your data. (this is essentially our labeled container that will contain your intents and let you manage and improve them.)
The playbook-* string is the unique identifier for your new project, it won't change so it's safe to add it to any local scripts you might use during the project's lifecycle.
Push your project data to your HumanFirst workspace
Note: We automatically detect whether your project uses Rasa2 with yaml files, or Rasa1 with markdown files.
Note: We use --clear in order to erase the workspace's contents so it reflects exactly what you have in your repository. It's not necessary for the first time, but it's a good way to bring in changes that someone else committed to the repository.
http://studio.humanfirst.ai/ will now show your newly created workspace along with the intents imported from the Rasa project.
Adding more data
We’ll add some phrases to the existing intents. We can use publicly available datasets in order to search for training phrases that fit. Since the intents added in rasa init are pretty generic, there are good chances we'll find relevant matches.
In Studio, click on the Data sources menu item on the left, then click the Use one of our data sets button to add existing conversations to your project. There are many choices available, but for this tutorial pick the STAR dataset, which contain goal oriented conversations for different tasks. If you have existing data, either from existing human-human conversation or a list of unclassified utterances, this is where you would import it into your workspace.
Augmenting existing intents
Now that we have some unlabled data to work with we can expand the currently defined intents.
In the Labeled data section, you'll find the list of imported intents. Activating one will bring up the list of its associated training examples. Click the Get Suggestions button and some suggestions will be provided from the dataset you added in the previous step. You can then accept training examples that make sense. The None of these look good button rejects the remaining elements.
Note: Recommendations work by looking at all the workspace’s training data and returns examples from your data sources. When you reject, we maintain a list of phrases that are internally tagged as “not part of that intent”. This list is used to improve suggestions, you can see it as an ephemeral binary classifier helping to narrow down your search until you get enough relevant examples.
Discovering new intents
Next, let’s take a look at the Unlabeled data section. This is where all utterances that haven't been assigned to an intent are located.
You’ll see a list of unlabeled utterances that is sourced from your data sources. Since you’ve already added some demo data, there should be a lot of data. The search bar on top is a full-text search feature allowing you to find things the old fashioned way. Try it first by searching for hotel - there are a few intents that can be created relating to these
One of the initial matches is Hi, I am looking for the rating of a hotel.. Go ahead and select it, you'll notice that a new option is available right under the selection: Show similar suggestions. This button will use semantic search to look for similar phrases in the corpus. It's a good idea to mix these two techniques because full text search gives you keyword-based results, and semantic search expands on the meaning of the utterance and returns more relevant matches.
Select a few examples where the user clearly asks for a hotel with a specific rating. Notice that the button is clickable again, doing so will look for results similar to all selected items.
Tip: You can shift+click to select a range without clicking on each of them separately.
Once you have enough elements, click the Label selected data button on the left, and click + Create here to create a new intent. Let's name it hotel_request_rating and click the Create and edit button.
Here are a few intents you may want to create:
- Book an appointment
- Reserve a hotel
- Reserve a hotel with a specific rating (see if you can make this one a child intent of the reserve a hotel one)
While working on your project, you may decide that some intents should be merged together or even broken down into more specific intents. In the Labeled data section, where you can view the list of training phrases for an intent, you'll notice a checkbox next to each phrase, clicking it with automatically sort the rest of the list by similarity to the selected phrases. You can click the similar phrases and move them using the left column, as we did with unlabeled utterances in the previous step.
Back to Rasa
We can export our changes using the command line
Check the differences:
You can now create stories incorporating the newly added intents.
Train your bot:
And then test it:
Is your Rasa bot deployed?
If your Rasa project is deployed and generating conversational data, you can use our CLI to sync conversations from Rasa to HumanFirst. With imported conversations from Rasa, you can improve your bot with the same workflows described above. To learn more click here.