Building an “AI agent” sounds exciting—visions of an autonomous entity that can plan, execute, learn, and adapt on its own. Add a few dynamic graphs and a name like “Dynamic Modular Agent for Adaptive Workflows,” and it starts to feel groundbreaking. In reality, though, most of these “agents” boil down to classic software design patterns we’ve been using for years. They’re essentially long-running jobs with some added intelligence and a structured approach to managing state.
In this post, we’ll demystify agents from a systems architecture perspective. We’ll see how frameworks like OpenAI’s Assistants API or Anthropic’s recommended workflows can be viewed as variations on job processing systems. We’ll also explore why you might build your own agent-based solution using basic composable patterns rather than adopting a large, specialized “agent framework.” The goal is to empower you to make simpler, more robust decisions when integrating AI-driven processes into your technology stack.
Why Agents Feel So Complex
The hype around “AI agents” is everywhere. Whether from marketing copy or open-source frameworks, the concept often conjures up an image of a near-sentient helper that can dynamically plan and execute tasks.
But from a practical standpoint:
Agents execute tasks in steps (much like a job scheduler).
Agents often rely on a queue or “run” concept to manage concurrency, statuses, and outputs.
Agents typically need external tools—database lookups, code execution, or messaging APIs—to accomplish tasks, similar to how orchestrators like Airflow operate.
All of these elements map neatly onto standard concurrency or batch-processing patterns. The main difference? An AI model is deciding the next step to take rather than a human-coded set of instructions, so some aspects of the infrastructure remain the same while others may change. Let’s see how this plays out in a few cases.
The Assistants API: A Classic Job Lifecycle
To understand the overall concept of what an agent does, let’s first look at OpenAI’s Assistants API (Beta) as a prime example. An Assistant can be treated like a “smart job worker” with some special advantages. There are concepts such as:
Threads (akin to conversations or job contexts)
Runs (similar to job invocations)
Run Steps (each step in the job’s execution process)
A Quick Tour of the Run Lifecycle
When you “run” an Assistant:
queued: Your run request enters the queue, much like a background job being scheduled.
in_progress: The system is actively executing the job. During this period, your AI agent may call external tools or generate output.
requires_action: The job is paused, waiting on some external input (like function arguments you must provide). If not given in time, the run might go expired.
completed: The job finished successfully. You can now look at the final “result” (in the Assistants API, the final messages appended to the Thread).
failed: Something went wrong—inspect the logs and error details.
cancelled: You pulled the plug before it completed.
This is basically the same lifecycle we see in typical asynchronous or batch job managers (e.g., Celery, Sidekiq, Airflow). But as mentioned, while it’s a job, these DAGs call LLMs—opening up design patterns you won’t see in traditional Airflow projects.
Long-Running Job, Meet Intelligent Worker
The fundamental shift is that an AI agent can interpret tasks dynamically, but that doesn’t change the underlying architecture much. If you’ve ever built an application with:
Job queues (RabbitMQ, SQS, Kafka)
Workflow managers (Airflow, Dagster, Jenkins)
Distributed computing (Spark, Ray)
Microservices calling each other via orchestrators
…then you have most of the mental models and systems in place for building agents. The big difference: the AI system can spontaneously decide which path of the DAG to follow, or even create new steps, rather than following a fixed script.
Anthropic’s Agent Patterns: “Smart DAGs” That Expand the Possibilities
In a great article by Anthropic, they describe a few design patterns that arise because an LLM is in the driver’s seat—dynamically choosing paths and interacting with each component in more flexible ways.
This blend of deterministic structure and adaptive intelligence creates “smart DAGs,” unlocking possibilities that rigid pipelines can’t always handle. For instance:
An AI might decide to retry a failed step with a different approach rather than simply failing the job.
The system could generate new sub-steps on the fly based on context.
Each path in the “graph” can adapt to user input or new data rather than sticking to a predetermined route.
Below are some of Anthropic’s core patterns—each can be understood as a standard pipeline or DAG “supercharged” by the fact that an AI model is making decisions at runtime:
Workflow | Description | Similar Traditional Pattern | Smart DAG Angle |
Prompt Chaining | Breaks a task into discrete steps, each step feeding into the next | Multi-stage pipeline in data engineering | The model can adapt how it composes prompts at each step if intermediate results aren’t aligning with user needs |
Routing | Uses a classifier (often an LLM) to direct a request to specialized sub-pipelines | Event router or enterprise service bus | The AI might discover new routing categories or refine existing ones over time, especially as it sees more edge cases |
Parallelization | Divides a job into chunks of work that run independently, then merges results | “Map-reduce” or standard concurrency patterns | The LLM can spawn additional sub-tasks mid-process if it determines the problem can be broken down further |
Orchestrator-Workers | A central orchestration node decides on subtasks to distribute to multiple worker nodes | “Master-worker” or “supervisor-worker” patterns in distributed systems | The orchestrator can adaptively reassign tasks or reorder steps based on partial results or updated inputs—beyond what typical DAGs might do without explicit coding |
Evaluator-Optimizer | One system instance judges the output of another, and they iterate until results are satisfactory | Automated QA loop layered onto a standard pipeline | The evaluator can adjust optimization parameters or generate new constraints, effectively altering the job flow if certain quality metrics aren’t met |
Autonomous Agent | Continually loops through planning, execution, and reflection until it meets a stopping condition | “Long-running job” + advanced state machine or feedback loops | Instead of a single pass, the DAG keeps “reinventing” itself with each iteration, leveraging real-time reasoning and external signals to alter its execution path |
In short, a lot of the architecture is familiar—we still rely on concurrency, locking, error-handling, and scheduling. Yet the intelligence of an LLM can create new paths on the fly, choose among them, or retry processes based on changing objectives. That’s what makes these workflows fundamentally more powerful than purely deterministic DAGs.
Why Simplicity Still Wins
AI agents introduce complexity in exchange for more dynamic responses. But as both Anthropic and OpenAI caution, don’t pile on complexity without need. Often, a single LLM call with in-context retrieval suffices. If you can solve your problem with a direct question-and-answer approach, do it—less can indeed be more.
When you do need an agent:
Start as small as possible.Use direct API calls and minimal scaffolding. Observe exactly how the model’s “planning” logic works.
Keep your lines of sight.If you adopt a framework, ensure you can trace each prompt and response. Abstractions that bury the chain of events can be hard to debug.
Document your tools well.Both Anthropic’s blog and OpenAI’s tools documentation emphasize the importance of well-designed “agent-computer interfaces (ACIs).” Provide clear parameters, usage examples, and constraints to reduce error.
Add concurrency and orchestration patterns only when it helps.Think of them like design patterns in any large software project—use them sparingly and intentionally.
Real-World Use Cases
For instance, I recommend learning about the TensorOps MDClone case, where we implemented a medical AI assistant inspired by Microsoft Taskweaver. You can find more details here. This illustrates how agent-based approaches work in high-stakes domains like healthcare.
Key Principles for Building Effective (Yet Simple) AI Agents
Embrace the Job Analogy: Whether it’s called an “agent,” “workflow,” or “process,” treat it like a job. That means:
Manage concurrency with proven patterns.
Use a queue or event bus for status tracking and job orchestration.
Log each step for auditing and debugging.
Manage State Thoughtfully
Like any long-running job, store context in a robust way (e.g., a “Thread” in OpenAI’s parlance, or a conversation ID in your own system).
Keep an eye on memory or context-window limits—similar to chunking data in normal large-scale processing.
Provide a Good Tooling Interface
Make function calls or external APIs easy to invoke.
If your agent needs to pass arguments, define them meticulously with typed fields, descriptions, and usage constraints.
Watch for Error Recovery
Agents can fail just like any job. Plan for incomplete runs, “stuck” states, or timeouts.
Design your system so you can gracefully revert or roll back partial steps.
Iterate with Observability
Just as you measure time, CPU, or memory consumption for standard pipelines, measure cost, latency, and success metrics for agent-based flows.
Evaluate how effectively the agent’s decisions improve outcomes versus simpler approaches.
Final Thoughts
We may call them “agents” or “autonomous workflows,” but at heart, they’re simply long-running jobs with dynamic decision-making. By focusing on the fundamentals—solid architecture, thorough documentation, iterative design, and a strong understanding of how LLMs fit into your existing system—you’ll build solutions that are both powerful and maintainable.
Don’t let the glitzy naming distract you from the timeless principles of distributed systems and concurrency patterns. When used judiciously, AI-driven agents can unlock incredible value by tackling problems we haven’t been able to automate before. When used unwisely, they can become just another layer of overengineering. The sweet spot is combining tried-and-true job architecture with the flexible reasoning of large language models.
By staying mindful of these patterns and principles, CTOs and system architects can deliver high-performing, easy-to-maintain AI features without building unwieldy “agents” that are more hype than help.
Comments