How agents acquire capabilities

Oct 24, 2023

Currently, LLM-powered autonomous agents can solve simple tasks like sending an email, filling in the blanks in a spreadsheet, or even making phone calls. However, as soon as it gets more complex, like implementing a whole database from scratch, controlling a robot in real life, or being able to control the browser universally, current agents fall short.

One of the main reasons for that is the limited reasoning and planning capabilities, but also simply being able to use tools.

There are numerous cases on the internet where someone mentions that GPT-4 can’t do something, but later someone can make it happen. How can an LLM-powered autonomous agent acquire new capabilities it doesn’t have today?

A recent survey about LLM-powered autonomous agents provides an excellent overview:

Finetuning

With Human Annotated Datasets: The dataset to fine-tune is constructed through human feedback, which is turned into natural language, directly used to fine-tune the model. An example of this is the WebShop Dataset, where researchers set up an artificial webshop that lets 1,600 humans use and collect the usage data.
With LLM-Generated Datasets: As recruiting humans to annotate data can be time-intensive and costly, an alternative promising approach is to automate the dataset generation. An example is the recently released Strategic Game Dataset from Laion, which includes 3.2 billion chess games, 236b Rubik cube moves, and 39b maze moves.
With Real-world Datasets: Besides building up artificial labeling processes, collecting data from real-world applications can also be very powerful. This can be real-world product usage, which doesn’t have an explicit human labeling component but yields sufficient data to build a dataset. An example is the Mind2Web dataset, which tries to create a generalistic dataset for the web.

Prompting Engineering

One can provide some few-shot examples to the LLM in the same prompt to improve planning or reasoning. Some agent implementations also utilize prompts around the agent’s beliefs and state of mind.

Mechanism Engineering

Trial and Error: In this method, an agent performs an action and is then invoked again to judge the action just performed. If the action is deemed unsatisfactory, the feedback is incorporated, and the agent iterates.
Crowd-sourcing: The prompt will be delegated to different agents. If they don’t respond with a consistent answer, solutions from other agents are used to update the response. In this method, (sub-)agents can incorporate other agents' opinions to improve the overall outcome.
Experience Accumulation: The agent here does a look-up if it saw a similar task or prompt before and uses that to improve the result.
Self-driven Evolution: The agent sets its own goals and adjusts the approach based on feedback from the environment and a reward function, gradually moving towards the overall goal. With that, the agent can acquire new capabilities in a self-driven manner.

Tim Suchanek

Discussion about this post