Oct 149 min read

Building AI and LLM Agents from the Ground Up: A Step-by-Step Guide

Updated: Oct 22

OpenAI’s vision of creating artificial general intelligence (AGI) might still be futuristic, but today’s AI agents are already making a significant impact. AI-driven tools have evolved far beyond simple chat interfaces, integrating with external systems to automate tasks, analyze data, and even make decisions autonomously.

In this blogpost, we’ll cover everything you need to know about AI agents: what they are, how they work, and where they’re headed. Plus, we’ll get hands-on and show you how to build your own AI agent, without getting lost in the theory.

Content

What are Agents?

From Standalone LLMs to Agents

Key Capabilities of AI Agents

Real World Applications

Let’s try a simple example!

Conclusion

What are Agents?

LLM agents are AI systems that leverage large language models (LLMs) to autonomously perform tasks by processing natural language instructions. These agents combine the advanced language understanding and generation capabilities of LLMs with decision-making, planning, and tool-usage skills. Essentially, they act as "intelligent assistants" that can break down complex tasks into smaller steps, interact with external systems, and adapt their strategies based on user input or changing conditions.

From Standalone LLMs to Agents

LLMs are powerful but limited by the data they are trained on. They can generate text and summarize information, but they lack real-time knowledge or the ability to perform actions beyond their training.

Imagine you want to draft a financial report for a business meeting. A standalone LLM can draft general content for a financial report, but it lacks access to financial data like stock prices or transaction records. In contrast, an agent can integrate with financial databases or APIs, allowing it to draft the report while incorporating real and specific data directly into the document.

Key Capabilities of AI Agents

Reasoning: They can plan, think through problems, and adjust their approach if initial solutions don't work, making them effective for complex tasks.
Acting: Beyond understanding, they can perform actions using tools like web searches or database queries, enabling them to access real-time information.
Memory: They can retain information from past interactions, allowing for personalized responses and adaptation based on user history.

Real World Applications

Here are some key real-world applications where LLM agents can be employed and have a significant impact:

Customer Support and Virtual Assistants: LLM agents can provide automated support through chatbots, handling a wide range of customer inquiries, from simple FAQs to complex problem-solving. They can access and integrate with databases to provide real-time information, improving efficiency and response time
Enterprise and “Talk to Your Data” Applications: These agents can help employees interact with their company’s knowledge base, simplifying the process of querying databases or interpreting structured data.
Programming and Code Generation: LLM agents can be used to assist developers by generating code snippets, automating repetitive coding tasks, and even debugging.

How do they work?

In an LLM-powered agent system, the LLM acts as the core "brain," enabling natural language understanding and reasoning. This brain is complemented by several key components that allow the agent to plan, learn, and interact effectively with its environment:

1. Planning

A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.

Subgoal Identification and Task Decomposition: The agent breaks down complex objectives into smaller, more manageable subgoals. By focusing on smaller parts, the agent can adjust its strategy based on intermediate results. The model can be instructed to act this way by using prompt techniques such as Chain of Thought (CoT) or Tree of Thoughts (ToT), which we’ve covered in a previous blogpost.

Reflection and Iterative Improvement: The agent incorporates self-criticism and reflection, analyzing its past decisions and actions to identify areas for improvement. This feedback loop allows the agent to learn from mistakes, adjust its approach, and refine future actions.

A way to achieve this is through ReAct prompting (Yao et al. 2022), which integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use external tools), while the latter prompts LLM to generate reasoning traces in natural language.

Example of ReAct Prompting (source Yao et al., 2022)

2. Memory

The memory module is a crucial part of LLM agents, allowing them to store internal logs, including past thoughts, actions, and interactions with users. This module enhances the agent's ability to provide contextually relevant responses and maintain continuity over time. In the literature, two primary types of memory are identified:

Short-term Memory: Short-term memory captures information about the agent's immediate context. It functions through in-context learning, where the agent uses the recent dialogue or task data to maintain continuity within a conversation or task. However, this type of memory is constrained by the context window of the LLM, meaning that only a limited amount of recent information can be recalled at any given time.
Long-Term Memory: Long-term memory allows the agent to retain information about past interactions and behaviors over longer durations. This is often achieved through external vector databases, which enable the agent to store and retrieve vast amounts of data.

3. Tool Use

Tools are a crucial part of the agent system, enabling it to interact with the outside world and access information beyond its pre-trained knowledge. The agent can call external APIs to gather real-time data, like the latest news, weather updates, or proprietary datasets, making it much more dynamic.

Imagine you’re planning a trip and ask an LLM travel assistant agent, "Can you find me a flight to Italy?" The agent doesn’t have a direct answer, because it doesn’t store real-time flight data. However, it knows which tools to use to find that information.

Choosing the tool: When setting up our agent, we include in the prompt a description of all the tools available. The agent understands that to find a flight, it needs access to a flight booking service/API.
Using the tool: The agent communicates with the tool by providing the details of your request, like the destination and the dates you want to travel. The tool returns the available flight options, just as a travel website would show you a list of flights.
When to stop using the tool: Once the agent has gathered all the flight information, like airlines, departure times, and prices, it stops using the tool. Now, it organizes the information into a clear response for you, like suggesting the cheapest or fastest one.

Let’s try a simple example!

By asking a model directly through the API, we can see that basic information like the time cannot be directly answered.

A vanilla LLM call, using LLMStudio, TensorOps LLM proxy server to abstract the calls

So, let’s build a simple agent that will be capable of answering questions about the time and also doing mathematical operations.

Before we begin, make sure you have the following:

Python installed
API key for the model you want to use
If you want to seamlessly access and switch between different models, be sure to also install LLMStudio

Designing the Agent Architecture

For this simple agent let’s focus on the basis that we talked about before: planning, memory and tools.

For this, we need:

A set of tools that the model will have at his disposal
A model with a basic reasoning loop that will decide when an how to use different tools
Keep track of previous interactions, in this simple case just on a short-term in-context memory

Implementing the Tools

Our agent will have a modular design, allowing us to add or remove tools easily. We'll use an abstract base class called Tool that defines the interface for all tools.

from abc import ABC, abstractmethod

class Tool(ABC):
    @abstractmethod
    def name(self) -> str:
        pass

    @abstractmethod
    def description(self) -> str:
        pass

    @abstractmethod
    def use(self, *args, **kwargs):
        pass

For the time tool, we need something like this:


import datetime
from zoneinfo import ZoneInfo

class TimeTool(Tool):
    def name(self):
        return "Time Tool"

    def description(self):
        return ("""Gives the current time for a given city's 	  timezone like Europe/Lisbon, America/New_York etc. If no timezone is provided, it returns the local time.""")

    def use(self, *args, **kwargs):
        format = "%Y-%m-%d %H:%M:%S %Z%z"
        current_time = datetime.datetime.now()
        input_timezone = args[0] if args else None
        if input_timezone:
            try:
                current_time = current_time.astimezone(ZoneInfo(input_timezone))
            except Exception:
                return f"Invalid timezone: {input_timezone}"
        return f"The current time is {current_time.strftime(format)}."

For adding calculator capabilities:

import math 

class CalculatorTool(Tool):
    def name(self):
        return "Calculator Tool"

    def description(self):
        return "Evaluates simple mathematical expressions. Supports operations like addition, subtraction, multiplication, and division."

    def use(self, *args, **kwargs):
        expression = args[0]
        try:
            # Evaluate the expression safely
            result = self.safe_eval(expression)
            return f"The result of '{expression}' is {result}."
        except Exception as e:
            return f"Sorry, I couldn't evaluate the expression '{expression}'. Error: {str(e)}"

    def safe_eval(self, expression):
        # Allowed names
        allowed_names = {
            'abs': abs,
            'round': round,
            'min': min,
            'max': max,
            'pow': pow,
            'sqrt': math.sqrt,
            'log': math.log,
            'sin': math.sin,
            'cos': math.cos,
            'tan': math.tan,
            'pi': math.pi,
            'e': math.e
        }

        allowed_chars = "0123456789+-*/()., "

        if any(char not in allowed_chars for char in expression):
            raise ValueError("Invalid characters in expression.")

        code = compile(expression, "<string>", "eval")

        for name in code.co_names:
            if name not in allowed_names:
                raise ValueError(f"Use of '{name}' is not allowed.")

        return eval(code, {"__builtins__": None}, allowed_names)

You can add more tools to enhance other capabilities and even connect to external APIs or databases.

Building the Agent Class

The Agent class manages the interaction between the user, the LLM, and the tools.

Key Responsibilities:

Managing Tools: Keeps track of available tools.
Memory Management: Maintains a conversation history.
Processing Input: Interprets user inputs and decides on actions.
Interacting with the LLM: Sends prompts and receives responses.

import requests
import json
import ast
from llmstudio import LLM

class Agent:
    def __init__(self):
        self.tools = []
        self.memory = []
        self.max_memory = 10

    def add_tool(self, tool: Tool):
        self.tools.append(tool)

    def json_parser(self, input_string):
        try:
            # Remove code block markers if present
            code_block_pattern = r"```json\s*(\{.*?\})\s*```"
            match = re.search(code_block_pattern, input_string, re.DOTALL)
            if match:
                json_str = match.group(1)
            else:
                # If no code block, try to match any JSON object in the string
                json_object_pattern = r"(\{.*?\})"
                match = re.search(json_object_pattern, input_string, re.DOTALL)
                if match:
                    json_str = match.group(1)
                else:
                    raise ValueError("No JSON object found in the LLM response.")
            # Parse the JSON string
            json_dict = json.loads(json_str)
            if isinstance(json_dict, dict):
                return json_dict
        except json.JSONDecodeError as e:
            print(f"JSON parsing error: {e}")
        print(f"LLM response was: {input_string}")
        raise ValueError("Invalid JSON response from LLM.")

    def process_input(self, user_input):
        self.memory.append(f"User: {user_input}")
        self.memory = self.memory[-self.max_memory:]

        context = "\n".join(self.memory)
        tool_descriptions = "\n".join(
            [f"- {tool.name()}: {tool.description()}" for tool in self.tools]
        )

        prompt = f"""You are an assistant that helps process user requests by determining the appropriate action and arguments based on the user's input.
                Context:
                {context}

                Available tools:
                {tool_descriptions}

                Instructions:
                - Decide whether to use a tool or respond directly to the user.
                - If you choose to use a tool, output a JSON object with "action" and "args" fields.
                - If you choose to respond directly, set "action": "respond_to_user" and provide your response in "args".
                - **Important**: Provide the response **only** as a valid JSON object. Do not include any additional text or formatting.
                - Ensure that the JSON is properly formatted without any syntax errors.

                Response Format:
                {{"action": "<action_name>", "args": "<arguments>"}}

                Example Responses:
                - Using a tool: {{"action": "Time Tool", "args": "Asia/Tokyo"}}
                - Responding directly: {{"action": "respond_to_user", "args": "I'm here to help!"}}

                User Input: "{user_input}"
                """

        response = self.query_llm(prompt)
        self.memory.append(f"Agent: {response}")

        response_dict = self.json_parser(response)

        # Handle the tool or response
        if response_dict["action"] == "respond_to_user":
            return response_dict["args"]
        else:
            # Find and use the appropriate tool
            for tool in self.tools:
                if tool.name().lower() == response_dict["action"].lower():
                    return tool.use(response_dict["args"])

        return "I'm sorry, I couldn't process your request."

    def query_llm(self, prompt):
        gemini= LLM("vertexai/gemini-1.5-pro-latest", api_key= GOOGLE_API_KEY)
        response=gemini.chat(prompt).choices[0].message.content.strip()
        return response

    def run(self):
        print("LLM Agent: Hello! How can I assist you today?")
        while True:
            user_input = input("You: ").strip()
            if user_input.lower() in ["exit", "bye", "close"]:
                print("Agent: See you later!")
                break
            response = self.process_input(user_input)
            print(f"Agent: {response}")

Putting it all together

if __name__ == "__main__":
    agent = Agent()
    agent.add_tool(TimeTool())
    agent.add_tool(CalculatorTool())
    agent.run()

Let’s Test It

We can see that our agent can now answer about the time!

And it can also do math operations:

Despite being a very simple system, this agent already adds value to a vanilla LLM, but the possibilities to extend this are endless!

Conclusion

To sum it all up, LLM agents are the future of LLM applications and they are changing how we interact with technology. They can handle complex tasks by combining reasoning, memory, and tool usage, making them far more capable than simple chatbots. As we saw, even a basic agent can transform an LLM into a smart assistant that can fetch real-time info or perform calculations.

The possibilities for AI agents are endless, and building them is getting easier every day. Whether you're looking to automate tasks, create personal assistants, or explore new tech, now’s the time to dive in and start experimenting.

If you want an insight on how agents will shape the future of AI, check our previous blogpost.

Building AI and LLM Agents from the Ground Up: A Step-by-Step Guide

Content

What are Agents?

From Standalone LLMs to Agents

Key Capabilities of AI Agents

Real World Applications

How do they work?

1. Planning

2. Memory

3. Tool Use

Let’s try a simple example!

Designing the Agent Architecture

Implementing the Tools

Building the Agent Class

Putting it all together

Let’s Test It

Conclusion

Further reading

Related Posts

Comments

Sign up to get updates when we release another amazing article