Back to blog
Articles
Articles
August 7, 2023
·
5 min read

Plan-And-Solve Prompting

August 7, 2023
|
5 min read

Latest content

Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Announcements
2 min read

Full Circle: HumanFirst Welcomes Maeghan Smulders as COO

Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
February 13, 2024
Tutorials
4 min read

Accelerating Data Analysis with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to accelerate data analysis.
January 24, 2024
Tutorials
4 min read

Exploring Contact Center Data with HumanFirst and Google Cloud

How to use HumanFirst with CCAI-generated data to streamline topic modeling.
January 11, 2024
Articles
5 min

Building In Alignment: The Role of Observability in LLM-Led Conversational Design

Building In Alignment: The Role of Observability in LLM-Led Conversational Design
December 6, 2023
Articles
5 min read

Rivet Is An Open-Source Visual AI Programming Environment

Rivet is suited for building complex agents with LLM Prompts, and it was Open Sourced recently.
September 27, 2023
Articles
6 min read

What Is The Future Of Prompt Engineering?

The skill of Prompt Engineering has been touted as the ultimate skill of the future. But, will prompt engineering be around in the near future? In this article I attempt to decompose how the future LLM interface might look like…considering it will be conversational.
September 26, 2023
Articles
4 min read

LLM Drift

A recent study coined the term LLM Drift. LLM Drift is definite changes in LLM responses and behaviour, over a relatively short period of time.
September 25, 2023
Tutorials
5 min read

Optimizing RAG with Knowledge Base Maintenance

How to find gaps between knowledge base content and real user questions.
April 23, 2024
Tutorials
4 min read

Scaling Quality Assurance with HumanFirst and Google Cloud

How to use HumanFirst with Vertex AI to test, improve, and trust agent performance.
March 14, 2024
Announcements
2 min read

Full Circle: HumanFirst Welcomes Maeghan Smulders as COO

Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
February 13, 2024

Let your data drive.

The notion of fine-tuning a Large Language Models (LLMs) for very specific generative use-cases is in most instances not feasible. However, due to the flexibility of LLMs, variations in Prompt Engineering can yield astounding results. This article covers a new prompt method which improves LLM results in accuracy and completeness.

Chain-Of-Thought (CoT) prompting is one of the most successful ways to query an LLM via a zero or few-shot, single prompt. CoT prompting does well particularly well in solving for multi-step reasoning tasks.

As I have shown in the past, multi-step reasoning tasks can be created by the LLM via a few-shot chain-of-thought (CoT) prompt which includes a few manually crafted step-by-step reasoning demonstrations. Followed by the request or problem statement, and the words: Let us think step by step.

But a recent study found that CoT prompting fails in three areas:

  1. Calculations (7% failure rate in test examples)
  2. Missing steps in a sequence of events (12% failure rate in test examples)
  3. Semantic misunderstanding (27% failure rate in test examples)

These vulnerabilities are addressed by Plan-And-Solve (PS) prompting and Plan-and-Solve prompting with more detailed instructions (PS+ prompting)

PS consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan.

Considering the image below… (a) shows a Zero-Shot-CoT prompt and (b) shows the Plan-And-Solve (PS) approach for prompting and answer extraction.

Zero-shot-CoT encourages LLMs to generate multi-step reasoning with “Let’s think step by step”, it may still generate wrong reasoning steps when the problem is complex.

But, PS prompting first asks LLMs to devise a plan to solve the problem by generating a step-by-step plan and carrying out the plan to find the answer.

Source

Below I submit the question to text-davinci-003 and get the correct answer. It might be that in multiple requests I get an incorrect answer, but there is no explanation or reasoning supplied by the LLM.

Moving on to the image below, the CoT method is employed, there is an improvement to the quality of the answer and surfaced reasoning. However, the PS example at the bottom is far superior in detail and segmenting the answer into a plan, a solution and subsequently executing on that solution.

The example below is a comparison between Plan-And-Solve Prompting (PS) and Plan-And-Solve Prompting accompanied by more detailed instructions (PS+).

PS+ prompting greatly improves the quality of the generated reasoning process.

Source

In the OpenAI playground example below, the question is asked via a very simple prompt with no instruction or guidance for the LLM. The incorrect answer is returned by text-davinci-003.

And here below the PS methodology is followed, yielding the correct result and showing the plan and the solution, reaching a final conclusion.

Considering the image below, the PS+ prompting methodology is followed with an augmented and detailed response.

Final Considerations

The number of tokens used for these detailed queries increases significantly, so there is a cost consideration.

Another consideration for PS and especially PS+ is the additional overhead and effort to design the prompt. From the tests it is clear how sensitive LLMs are to prompt wording and composition.

Lastly, PS and PS+ do address calculation and reasoning vulnerabilities, but semantic misunderstanding still remains. I believe it is possible to solve for this by supplying a contextual reference within the prompt.

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox