A while ago OpenAI has released the API for The LLM gpt-3.5-turbo, the same model used in ChatGPT. Additionally, the Whisper speech-to-text large-v2 model is available through an API for transcription.

GPT 3.5 Turbo & Chat Markup Language (ChatML)

The ChatGPT models are accessible via API, with gpt-3.5-turbo being used in the examples below. OpenAI also has a model named gpt-3.5-turbo-0301.

For up-to-date model information, the OpenAI model page is a great resource.

It is important to remember that OpenAI models are non-deterministic, meaning that the same input given at different times or even in a row can lead to different or varying results.

OpenAI states that setting the temperature to 0 will make the output mostly deterministic, but some variability may still exist. The ChatGPT web interface we are used to is very successful in managing conversational context.

ChatGPT web interface by OpenAI

It is important to note that the ChatGPT model (gpt-3.5-turbo) accessed through the API does not keep track of conversational context, as can be seen in the example below:

The ChatGPT model is able to maintain conversational context with a few-shot approach by buffering the prompts, similar to the way OpenAI managed context via the initial web interface.

An example of this few-shot learning prompt in action is shown below, with a very contextual and empathetic response from the ChatGPT model:

Chat Markup Language (ChatML) is an example JSON file that defines the roles of system, user, and assistant.

It is designed to protect against prompt injection attacks, which are the main security vulnerability and avenue of abuse for LLMs.

Below is a working example of a completion request sent to the gpt-3.5-turbo model using the ChatML file. The following Python code snippet can be run in a Colab Notebook:

Notice the role which is defined, the model detail which is gpt-3.5-turbo-0301, and other more in the output from the completion request below.

OpenAI Whisper large-v2 Model

Considering accessing the OpenAI Whisper AI via a Colab Notebook:

The result from uploading the MP3 audio file.

The lines of Python code to transcribe the audio:

And below is the output result…

I find it interesting that Whisper is able to detect the language of the recording before transcribing it.

According to the available Whisper models, languages and Word Error Rates (WER), Spanish has the best rate of 3, followed by Italian with a WER of 4, and English with a WER of 4.2. You can read more about it here.

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox