Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
The skill of Prompt Engineering has been touted as the ultimate skill of the future. But, will prompt engineering be around in the near future? In this article I attempt to decompose how the future LLM interface might look like…considering it will be conversational.
Personal and professional history might not repeat, but it certainly rhymes. I’m thrilled to join the team at HumanFirst, and reconnect with a team of founders I not only trust, but deeply admire.
This study from August 2023 considers 10 different prompt techniques, over six LLMs and six data types.
This study compared 10 different zero-shot prompt reasoning strategies over six LLMs (davinci-002, davinci-003, GPT-3.5-turbo, GPT-4, Flan-T5-xxl & Cohere command-xlarge) referencing six QA datasets ranging from scientific to medical domains.
Some notable findings were:
As is visible in the graphed data below, some models are optimised for specific prompting strategies and data domains.
Gains from Chain-Of-Thought (CoT) reasoning strategies are effective across domains and LLMs.
GPT-4 has the best performance across data domains and prompt techniques.
The header image depicts the performance of each of the six LLMs used in the study and their respective overall performances.
The image below shows the 10 prompt techniques used in the study, with an example of each prompt, and the score achieved by each prompt technique. The scores shown here are specifically related to the GPT-4 model.
And the performance of each LLM based on the six datasets. The toughest datasets to navigate for the LLM were MedQA, MedMCQA and arguably OpenBookQA.
Throughout the study it is evident that GPT-4’s performance is stellar. Noticeable is Google’s good performance in OpenBookQA.
I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.
Subscribe to HumanFirst Blog
Get the latest posts delivered right to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.