Back to blog
Articles
Articles
June 22, 2023
·
4 Minutes

LLM Prompt Injection Attacks & Testing Vulnerabilities With ChainForge

June 22, 2023
|
4 Minutes

Latest content

Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Announcements
2 min read

Full Circle: HumanFirst Welcomes Maeghan Smulders as COO

Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
February 13, 2024
Tutorials
4 min read

Accelerating Data Analysis with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to accelerate data analysis.
January 24, 2024
Tutorials
4 min read

Exploring Contact Center Data with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to streamline topic modeling.
January 11, 2024
Articles
5 min

Building In Alignment: The Role of Observability in LLM-Led Conversational Design

Building In Alignment: The Role of Observability in LLM-Led Conversational Design
December 6, 2023
Articles
5 min read

Rivet Is An Open-Source Visual AI Programming Environment

Rivet is suited for building complex agents with LLM Prompts, and it was Open Sourced recently.
September 27, 2023
Articles
6 min read

What Is The Future Of Prompt Engineering?

The skill of Prompt Engineering has been touted as the ultimate skill of the future. But, will prompt engineering be around in the near future? In this article I attempt to decompose how the future LLM interface might look like…considering it will be conversational.
September 26, 2023
Articles
4 min read

LLM Drift

A recent study coined the term LLM Drift. LLM Drift is definite changes in LLM responses and behaviour, over a relatively short period of time.
September 25, 2023
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Announcements
2 min read

Full Circle: HumanFirst Welcomes Maeghan Smulders as COO

Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
February 13, 2024

Let your data drive.

Articles

LLM Prompt Injection Attacks & Testing Vulnerabilities With ChainForge

COBUS GREYLING
June 22, 2023
.
4 Minutes

Using the ChainForge IDE to batch test and measure prompt injection detection.

What Is Prompt Injection?

Riley Goodside, a data scientist at Copy.ai, was the first to report publicly about a new type of attack that involves getting large language models (LLMs) to disregard their intended programming by including malicious text such as “ignore your previous instructions” in user input.

This attack method was labeled “prompt injection” by Simon Willison.

A very good summary on prompt injection attacks was written by Carol Anderson.

Large Language Model Prompt Injection attacks (LLMPI) are a type of attack on natural language processing (NLP) algorithms.

The attackers can insert malicious prompts into the training phases of NLP models to create backdoor vulnerabilities.

An attacker can create malicious prompts that cause the target algorithms to output specific results.

This could be used to cause a system to mistake a malicious input for something that is benign when in reality the input could cause damage to the system, surface previous prompts or user requests.

Even confidential company information on the creation process of the LLM.

LLMPIs are particularly difficult to detect and mitigate since the malicious prompts are embedded in the training data and are indistinguishable from regular inputs.

ChatML makes explicit to the model the source of each piece of text, and particularly shows the boundary between human and AI text. And is a vital initiative from OpenAI in starting to solve for prompt injection. Read more about the malicious side of such attacks here.

ChainForge Prompt Injection Experiment

Below on the left are five prompts which will be submitted to the LLMs, with the malicious prompts to be injected on the right.

The intended prompts and the malicious prompts are on the left of the screen below. The template defines the intended prompts as {command} and the injections as {input} .

The prompts are run twice against two OpenAI models and the result is printed out to an inspect node. A Python script parses the LLM responses, with the results being displayed in both a graphic and an inspect node.

Below, the graphic is fully interactive and it’s clear that GPT4’s performance is significantly better than GPT3.5.

In Conclusion

This article only illustrates the basic principles of prompt injection and the LLM failing in some instances to distinguish between a legitimate request and an ill-intended or malicious request.

The real danger of prompt injections lies on a few fronts…the one is where a model is trained by user requests and behaviour, with the user behaviour skewing the model to be untruthful and nefarious in responses.

The second danger is for LLMs to be tricked into yielding company ways of work, code names, model training, previous LLM users and their data and more.

I’m currently the Chief Evangelist @ HumanFirst. I explore and write about all things at the intersection of AI and language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces and more.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox