The Rise of BabyAGI

Harnessing GPT-4 for Task-Driven Autonomous AI Agents

I recently came across discussions on autonomous AI agents, such as the GPT-4-based agent called BabyAGI. BabyAGI was developed by venture capitalist Yohei Nakajima.

Here are the main components of the system:

  1. User Input: The user provides input to the system, which includes task descriptions and the overall objective.

  2. Task Creation Agent: This agent is responsible for generating new tasks based on the user's input and the result of a previously executed task.

  3. Task Prioritization Agent: This agent takes the generated tasks and prioritizes them, ensuring that the most relevant tasks are executed first.

  4. Execution Agent: This agent is responsible for executing the tasks based on the provided objective and context from the memory (Pinecone).

  5. Pinecone (Memory): Pinecone serves as the memory of the system, storing the results of executed tasks and their associated context. The Execution Agent retrieves relevant context from Pinecone when needed.

These components interact with each other in a continuous loop, with the Task Creation Agent generating new tasks and the Task Prioritization Agent organizing them. The Execution Agent then executes the tasks and stores the results in Pinecone. The system continues this process until there are no more tasks left to complete.

BabyAGI is thus a clever combination of statefulness and prompt engineering. Let's take a look at the task creation agent Python code to better understand what is going on under the hood:

def task_creation_agent(
    objective: str, result: Dict, task_description: str, task_list: List[str]
):
    prompt = f"""
    You are a task creation AI that uses the result of an execution agent to create new tasks with the following objective: {objective},
    The last completed task has the result: {result}.
    This result was based on this task description: {task_description}. These are incomplete tasks: {', '.join(task_list)}.
    Based on the result, create new tasks to be completed by the AI system that do not overlap with incomplete tasks.
    Return the tasks as an array."""
    response = openai_call(prompt)
    new_tasks = response.split("\n") if "\n" in response else [response]
    return [{"task_name": task_name} for task_name in new_tasks]

As you can see, the task_creation_agent function is designed to create new tasks based on the result of a previously completed task, the objective, the task description, and the current task list. The function takes four parameters: objective, result, task_description, and task_list.

A prompt is created using these parameters, which informs the AI that it is a task creation agent with a specific objective. It is provided with the result of the last completed task, the task description, and the list of incomplete tasks. The goal is to create new tasks that do not overlap with the existing incomplete tasks.

The prompt is sent to the OpenAI API using the openai_call() function. The response from the API is then processed to extract the new tasks. If the response contains multiple tasks separated by line breaks, they are split into an array. If there's only one task, it is wrapped in an array. Finally, the function returns a list of dictionaries, with each dictionary containing the name of a new task.

This function plays a crucial role in the autonomous agent, as it helps generate new tasks for the AI system to complete while ensuring they do not conflict with the existing incomplete tasks.

This autonomous AI agent has generated considerable interest, as it showcases the potential capabilities of AI agents in the future. OpenAI's recent development of plugins that allow for interaction with various other applications and APIs through chat is also a promising development that leverages context (memory) and prompt engineering.

However, there are limitations to these autonomous AI agents. They rely heavily on the reasoning abilities of GPT-3 or GPT-4, which are not designed as reasoning machines and can produce falsehoods or commit logical reasoning mistakes. This presents a significant flaw, as we cannot control the reasoning process or output when in autonomous mode. But I am still hopeful because I think GPT-4 can produce way fewer logical reasoning mistakes than most human beings, given enough context.

Task-driven autonomous agents are intriguing, particularly when connected to various APIs, enabling users to perform actions through chat interfaces. I recently wrote an article explaining why I think the chatbot is the next UI. However, we are not quite there yet, as we need to improve the reasoning abilities of AI agents and implement systems that check for logical reasoning and bias. Ideally, the AI models should reason logically even when answering questions outside their training sets.

In summary, task-driven autonomous agents are promising, but there's still work to be done to improve their reliability and reasoning abilities. Ensuring these agents can handle mistakes and perform checks is crucial for scalability. I'm looking forward to seeing the future developments in this area.

Enjoyed this issue? Share it with your friends and encourage them to subscribe to our newsletter for more amazing content!

What I am reading