ChainForge is an IDE for prompt engineering and a number of important improvements were made to the tool.

Recently I created a diagram to map out the LLM application tooling landscape.

The tools listed employ the principle of prompt engineering in various ways ranging from a no-code to a pro-code approach. Virtually all the tools listed are focused on improving the end-user experience or prompt management and augmentation.

ChainForge has a different focus, in that ChainForge can be used to test how robust prompts are by evaluating the response from different models.

This is a practical approach of prompting multiple LLMs and comparing the different responses in a practical and hands-on way.

ChainForge is described as a visual programming environment for prompt engineering.

ChainForge can now be run from the public available URL, with no user account or installation required. Alternatively ChainForge can be installed locally with:

pip install chainforge
chainforge serve

Considering the example below, which you can access by simply clicking here…the four LLMs (PaLM2, Claude, GPT4 & GPT3.5) are each given secret codes and is instructed not to divulge the secret codes.

A set of ambiguous prompts are then injected in order to trick the LLMs to include the codes in their response.

ChainForge has eight flow nodes to make use of, with a handy import and export feature. Export/import is convenient for sharing and storing flows and a flow can be instantly shared via the Share button.

The New Flow button initiates a new flow by discarding the existing flow. It would work much better if the New Flow button created a tab within the IDE. Where multiple flows can be open simultaneously and managed via tabs within ChainForge. A dark mode option will also be handy.

The LLMs available via ChainForge are OpenAI, HuggingFace, Google PaLM and OpenAI via Azure. You will need to have access to your own API keys for each LLM.

Something that makes the Vercel playground so attractive, is that you get to interact with LLMs without having your own log in credentials and access. As some of the LLMs still have very restricted access.

Below you can see to what level specific models can be selected and what settings are available.

In Closing…

The power of LLMs are primarily being harnessed via prompt chaining and autonomous agents. These two disciplines are supported by prompt engineering tools to fine-tune, store, share, etc. prompts.

These tools have made it possible for LLM Applications (also known as Generative Apps) to make use of multiple LLMs within one application.

Also with continuous LLM model updates, deprecation and new models being released, rapid and benchmarked testing of LLM responses to prompts are becoming increasingly important.

Considering these factors, ChainForge is unique in the sense that it is a no-code to low-code IDE for rapid LLM response testing by making use of a flow approach.

The flow approach also makes it possible to simulate portions of an existing chain.

I’m currently the Chief Evangelist @ HumanFirst. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

Subscribe to HumanFirst Blog

Get the latest posts delivered right to your inbox