The notion of fine-tuning a Large Language Models (LLMs) for very specific generative use-cases is in most instances not feasible. However, due to the flexibility of LLMs, variations in Prompt Engineering can yield astounding results. This article covers a new prompt method which improves LLM results in accuracy and completeness.

Chain-Of-Thought (CoT) prompting is one of the most successful ways to query an LLM via a zero or few-shot, single prompt. CoT prompting does well particularly well in solving for multi-step reasoning tasks.

As I have shown in the past, multi-step reasoning tasks can be created by the LLM via a few-shot chain-of-thought (CoT) prompt which includes a few manually crafted step-by-step reasoning demonstrations. Followed by the request or problem statement, and the words: Let us think step by step.

But a recent study found that CoT prompting fails in three areas:

  1. Calculations (7% failure rate in test examples)
  2. Missing steps in a sequence of events (12% failure rate in test examples)
  3. Semantic misunderstanding (27% failure rate in test examples)

These vulnerabilities are addressed by Plan-And-Solve (PS) prompting and Plan-and-Solve prompting with more detailed instructions (PS+ prompting)

PS consists of two components: first, devising a plan to divide the entire task into smaller subtasks, and then carrying out the subtasks according to the plan.

Considering the image below… (a) shows a Zero-Shot-CoT prompt and (b) shows the Plan-And-Solve (PS) approach for prompting and answer extraction.

Zero-shot-CoT encourages LLMs to generate multi-step reasoning with “Let’s think step by step”, it may still generate wrong reasoning steps when the problem is complex.

But, PS prompting first asks LLMs to devise a plan to solve the problem by generating a step-by-step plan and carrying out the plan to find the answer.


Below I submit the question to text-davinci-003 and get the correct answer. It might be that in multiple requests I get an incorrect answer, but there is no explanation or reasoning supplied by the LLM.

Moving on to the image below, the CoT method is employed, there is an improvement to the quality of the answer and surfaced reasoning. However, the PS example at the bottom is far superior in detail and segmenting the answer into a plan, a solution and subsequently executing on that solution.

The example below is a comparison between Plan-And-Solve Prompting (PS) and Plan-And-Solve Prompting accompanied by more detailed instructions (PS+).

PS+ prompting greatly improves the quality of the generated reasoning process.


In the OpenAI playground example below, the question is asked via a very simple prompt with no instruction or guidance for the LLM. The incorrect answer is returned by text-davinci-003.

And here below the PS methodology is followed, yielding the correct result and showing the plan and the solution, reaching a final conclusion.

Considering the image below, the PS+ prompting methodology is followed with an augmented and detailed response.

Final Considerations

The number of tokens used for these detailed queries increases significantly, so there is a cost consideration.

Another consideration for PS and especially PS+ is the additional overhead and effort to design the prompt. From the tests it is clear how sensitive LLMs are to prompt wording and composition.

Lastly, PS and PS+ do address calculation and reasoning vulnerabilities, but semantic misunderstanding still remains. I believe it is possible to solve for this by supplying a contextual reference within the prompt.

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox