AI Agents System Architecture - Possible strategies

If you're building AI-powered systems, you probably know that a single prompt is rarely enough to deliver a meaningful business feature. The real work begins when designing agent workflows. Lately I've been deep in the topic of design patterns and I keep seeing everything boil down to three approaches: single agent, sequential, and parallel. Each has its own pitfalls worth talking about honestly.

Single agent strategy - agents get confused by too much logic

The single-agent approach is the easiest. You write one big prompt, attach some tools (e.g., a search engine), and hope the model figures out the logic on its own. The problem is that models are non-deterministic. With complex tasks, the model simply "bloats" and starts losing track of instructions buried halfway through the text. You lose any control over what's happening under the hood.

Before we get into the patterns, we need to configure the API connection (in this case via OpenRouter):

require 'ruby_llm'

RubyLLM.configure do |config|
  config.openrouter_api_key = "YOUR_API_KEY"
end

MODEL = "arcee-ai/trinity-large"

And sample usage:

# Example of a simple query
response = RubyLLM.chat(model: MODEL).ask("Say 'hello' in one word.")
puts "Response: #{response.content}"

In this example it accepts tools, and the "execute" method simulates choosing one of them. This works for simple things, but if you need a guarantee that the system will follow a specific procedure, a single agent won't give you that.

Sequential agents strategy - provide actual reliability

When a process needs to be repeatable, a sequence works better. The output from one agent goes straight to the next. This gives great peace of mind, because you know exactly which stage you're at. Agents communicate through a shared session state, which is essentially their short-term working memory. The downside? This setup is rigid. If something unexpected happens that you didn't account for in the code, the system will crash because it can't jump to a different step.

chat = RubyLLM.chat(model: MODEL)

# Step 1: Fetching a fact
fact = chat.ask("Give me one fact about the Ruby language in one sentence.")
puts "Step 1 (Fact): #{fact.content}"

# Step 2: Translating the fact (using the same session/chat)
translation = chat.ask("Translate this to Spanish: #{fact.content}")
puts "Step 2 (Translation): #{translation.content}"

Parallel agents strategy - crush latency issues

If you have three independent things to do (e.g., checking flights, hotels, and weather), doing them one by one is a waste of the user's time.

prompts = [
  "What flights are available to Warsaw tomorrow?",
  "What hotels are available in Warsaw tomorrow?",
  "What is the weather in Warsaw tomorrow?"
]

# Fire off queries in separate threads
threads = prompts.map do |prompt|
  Thread.new do
    RubyLLM.chat(model: MODEL).ask(prompt)
  end
end

# Collect the results
responses = threads.map(&:value)
responses.each_with_index do |res, i|
  puts "Question '#{prompts[i]}': #{res.content}"
end

# You can also use some orchestrator logic than using threads

The parallel pattern lets you fire up multiple agents at once. Latency drops dramatically, but costs go up (you pay for multiple calls in the same second) and you need to write the glue code that assembles all those results into something coherent at the end.

Summary

Building on AI is a constant search for a balance between speed and precision. Choosing the right pattern is the first important decision you'll make in a project when building agents.

Happy agenting!