One of the most powerful concepts in AI agents: Voyager
Not even half a year ago, a revolutionary paper was released in the field of autonomous agents research that, in my opinion, hasn’t gotten enough attention given the magnitude of impact it will have. I’m talking about VOYAGER: An Open-Ended Embodied Agent with Large Language Models.
The authors from NVIDIA, Caltech, UT Austin, Stanford, UW Madison, including Jim Fan, have built an agent that can teach itself to play Minecraft. How does it work?
Explore the world, and whenever you find yourself in a new situation, reason about the situation and come up with an idea of what tool to build or use.
Try to use or build the tool. At this point, they have an iterative algorithm, which tries over and over, with the agent self-verifying if the tool was successfully created or the action successfully performed. Example for this: Reasoning: Since you have a wooden pickaxe and some stones, it would be beneficial to upgrade your pickaxe to a stone pickaxe for better efficiency. Task: Craft 1 stone pickaxe.
Once the agent verifies that it successfully performed the task, it will save it in a skill library.
The next time the agent faces a similar situation, it can take the skill from the library instead of having to create it again.
It uses GPT-4 as its reasoning engine. With this simple mechanism, Voyager was able to beat the state of the art by orders of magnitude. It allows the agent to explore the world in a self-supervised fashion without the need for human intervention.
Now the obvious question is - where else can one apply the Voyager approach? Robotics, Healthcare, Education, Environmental Tracking, Smart Home assistants - the possibilities are endless.
For a good reason, this paper won the NeurIPS Outstanding Paper Award.