BlogVideos

Agent architectures in practice: patterns, platforms, and use cases

By Jhony Vidal
December 24, 2025
4 min read
Agent architectures in practice: patterns, platforms, and use cases

Agents are not “chatbots with a longer prompt”.

An agent is a system that can work toward a goal using a loop:

  1. Understand the goal
  2. Plan
  3. Use tools and data
  4. Check results
  5. Repeat (or hand off to a human)

This post is a practical, easy read on:

  • The core building blocks of an agent
  • The most common multi-agent orchestration patterns (with diagrams)
  • How production teams implement agents using Azure AI Foundry, Microsoft Agent Framework, and the OpenAI SDK

The core agent loop (the only diagram you really need)

flowchart TD
classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;
classDef tool fill:#0b1220,stroke:#38bdf8,color:#e0f2fe;
classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;
classDef out fill:#052e1a,stroke:#22c55e,color:#dcfce7;
Goal["Goal / request"]:::step --> Plan["Plan"]:::step
Plan --> Act["Act (use tools)"]:::step
Act --> Check["Check result"]:::guard
Check -->|good| Done["Done"]:::out
Check -->|needs more| Plan
Check -->|risk/unclear| Human["Human review / handoff"]:::guard --> Done
subgraph Tools["Tools / data access"]
DB["Datastores"]:::tool
API["APIs"]:::tool
Search["Search / retrieval"]:::tool
Workflow["Workflows (Logic Apps, etc.)"]:::tool
end
Act --> DB
Act --> API
Act --> Search
Act --> Workflow

If you build everything around this loop, you’ll avoid most “agent failures”:

  • “It answers confidently but wrong” → missing checks and grounding
  • “It can’t finish tasks” → missing tools or state
  • “It did something dangerous” → missing guardrails + approval gates

What makes an agent production-ready (simple checklist)

  • State: what step it’s on, what it already tried, what it learned.
  • Tools: limited, well-defined actions (read-first, write later).
  • Grounding: retrieval, citations, and “show your sources”.
  • Identity + permissions: least privilege and per-agent identity.
  • Observability: traces, logs, and the ability to replay failures.
  • Evaluation: a small “golden set” you run after changes.
  • Human-in-the-loop: approvals for risky actions.

Multi-agent orchestration patterns (the ones that actually show up in real apps)

Different tasks need different coordination. Here are the patterns you’ll see most often.

flowchart TB
classDef box fill:#0b1220,stroke:#334155,color:#e5e7eb;
classDef hub fill:#111827,stroke:#6366f1,color:#eef2ff;
classDef good fill:#052e1a,stroke:#22c55e,color:#dcfce7;
subgraph Sequential["Sequential"]
S1["Agent A"]:::box --> S2["Agent B"]:::box --> S3["Agent C"]:::box
end
subgraph Concurrent["Concurrent"]
C0["Coordinator"]:::hub --> C1["Agent A"]:::box
C0 --> C2["Agent B"]:::box
C0 --> C3["Agent C"]:::box
C1 --> C0
C2 --> C0
C3 --> C0
end
subgraph Handoff["Handoff"]
H0["Router"]:::hub --> H1["Specialist A"]:::box --> H2["Specialist B"]:::box --> H3["Specialist C"]:::box
end
subgraph GroupChat["Group chat (debate + converge)"]
G0["Facilitator"]:::hub --> G1["Worker"]:::box
G0 --> G2["Reviewer"]:::box
G1 --> G2 --> G0
end
subgraph Workflow["Workflow process (DAG)"]
W1["Step 1"]:::box --> W2["Step 2"]:::box --> W3["Step 3"]:::box
W2 --> W4["Parallel branch"]:::box --> W3
end
Sequential --> Concurrent --> Handoff --> GroupChat --> Workflow

When to use each pattern

  • Sequential: clean pipeline tasks (extract → transform → write).
  • Concurrent: parallel research (3 sources at once) then merge.
  • Handoff: routing to specialists (billing vs tech vs legal).
  • Group chat: “worker + critic” loops to reduce mistakes.
  • Workflow/DAG: repeatable business processes with checkpoints.

Platforms and tools: who does what?

You can build agents purely in code, purely on a platform, or (most common) as a hybrid.

Microsoft Agent Framework (build/orchestrate in code)

Microsoft describes Agent Framework as an open-source engine for building agentic apps, with durability, safety hooks, and paths to production. See the overview: Introducing Microsoft Agent Framework.

Azure AI Foundry Agent Service (operate agents as a managed platform)

Azure AI Foundry provides a managed place to run agents with enterprise features (deployment, governance, identity, monitoring). Product entry point: Azure AI Foundry Agent Service.

For SDK and endpoints, see: Get started with Foundry SDKs and endpoints.

Multi-model reality (including Anthropic models in Foundry)

In many teams, the “agent architecture” stays the same, but you swap models based on:

  • latency vs reasoning
  • cost vs quality
  • tool-use reliability
  • governance and deployment constraints

Microsoft has positioned Foundry as a multi-model layer, including Anthropic Claude models in the catalog: Claude models in Microsoft Foundry.

OpenAI SDK (model + tool calling from your app)

If you prefer to own orchestration in your repo, you can build the loop in code and call models via the OpenAI SDK. In production, this usually means:

  • strict tool schemas
  • structured outputs where possible
  • robust retries and timeouts
  • tracing across model calls and tool calls

Use cases (and how to map them to patterns)

1) Support triage agent (handoff + workflow)

  • Input: user ticket + logs
  • Tools: knowledge base search, ticket system API
  • Pattern: router decides “billing vs tech”, then a workflow runs the steps

2) Incident response agent (concurrent + human approval)

  • Input: alert payload
  • Tools: log query, runbook lookup, safe remediation actions
  • Pattern: parallel diagnostics, then a human approves remediation

3) Enterprise “analyst” agent (hybrid retrieval + semantic layer)

  • Input: questions like “why did churn increase?”
  • Tools: BI semantic model, vector search over docs, SQL access via approved views
  • Pattern: concurrent retrieval + reranking + cited summary

How to implement this in practice (3 realistic paths)

Path A: Azure AI Foundry-first (platform-led)

Best when you want a managed “run and operate” layer.

flowchart LR
classDef sys fill:#0b1220,stroke:#334155,color:#e5e7eb;
classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;
classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;
classDef out fill:#052e1a,stroke:#22c55e,color:#dcfce7;
U["User / trigger"]:::step --> A["Foundry Agent Service"]:::step
A --> T["Built-in tools + connectors\n(files, search, workflows)"]:::sys
A --> ID["Identity + RBAC\n(Entra, least privilege)"]:::guard
A --> Obs["Tracing + logs + eval runs"]:::sys
A --> H["Human approval gate\n(optional)"]:::guard
H --> O["Action / answer"]:::out
A --> O
  • Define the job: one workflow, one success metric.
  • Pick a model deployment and decide which tools are allowed.
  • Connect data using approved connectors and enforce access control.
  • Add guardrails: redaction, approvals, rate limits.
  • Add observability + evals before you scale.

Start with the SDK entry point: Foundry SDK overview.

Path B: Agent Framework-first (code-led, deploy to Foundry)

Best when you want an SDK/workflow engine in your repo, but still want platform operations.

flowchart TB
classDef sys fill:#0b1220,stroke:#334155,color:#e5e7eb;
classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;
classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;
Repo["Your repo\n(agent code + tests + evals)"]:::sys
AF["Agent Framework\n(router + specialists + reviewer)"]:::step
Tools["Tools as code\n(OpenAPI clients, DB adapters)"]:::sys
Foundry["Azure AI Foundry runtime\n(host + identity + monitoring)"]:::step
Repo --> AF --> Tools
AF --> Foundry --> Guard["RBAC + approvals + policies"]:::guard
  • Model your agents as roles: router, specialist(s), reviewer.
  • Make tools explicit (and small). Prefer read-first.
  • Run locally with traces and a golden dataset.
  • Deploy to a managed runtime and wire identity/permissions.

Agent Framework overview: Microsoft Agent Framework blog.

Microsoft Learn also provides Agent Framework docs (example agent types): Azure AI Foundry agent type.

Path C: OpenAI SDK-first (own the loop in your app)

Best when you want maximum portability and you already have strong engineering around reliability.

flowchart LR
classDef sys fill:#0b1220,stroke:#334155,color:#e5e7eb;
classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;
classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;
classDef out fill:#052e1a,stroke:#22c55e,color:#dcfce7;
UI["API / UI"]:::sys --> Orchestrator["Your agent loop\n(plan → tool → check"]:::step
Orchestrator --> Model["Model via OpenAI SDK"]:::step
Orchestrator --> Tooling["Tools\n(DB, HTTP, search)"]:::sys
Orchestrator --> Safety["Policies\nbudgets, allow-lists"]:::guard
Orchestrator --> Result["Answer / action"]:::out
  • Implement the loop: plan → tool → check
  • Add structured tool schemas
  • Add a safe “check step” (critique, constraints, citations)
  • Add rate limits, budgets, and caching

If you want a vendor-neutral convention for “agent instructions”, OpenAI has pushed AGENTS.md via the Agentic AI Foundation: OpenAI: Agentic AI Foundation.


A clean Python starter (dependencies + a small, maintainable agent loop)

This is a compact “production-shaped” baseline: typed inputs/outputs, one router, a couple tools, retries, and clear boundaries.

Dependencies (pick what you need)

pip install "pydantic>=2.7" "httpx>=0.27" "tenacity>=8.2" "structlog>=24.1" "python-dotenv>=1.0"

Optional but common in real teams:

pip install "opentelemetry-api>=1.26" "opentelemetry-sdk>=1.26" "pytest>=8.0"

Example: router → tool → verify (minimal, readable)

from __future__ import annotations
import os
from dataclasses import dataclass
from typing import Literal, Optional
import httpx
import structlog
from pydantic import BaseModel, Field
from tenacity import retry, stop_after_attempt, wait_exponential_jitter
log = structlog.get_logger()
# -----------------------------
# Types (clear contracts)
# -----------------------------
Department = Literal["billing", "tech", "general"]
class Ticket(BaseModel):
id: str
subject: str
body: str
customer_tier: Literal["free", "pro", "enterprise"] = "free"
class RoutedTask(BaseModel):
department: Department
reason: str = Field(..., max_length=240)
class ToolResult(BaseModel):
ok: bool
summary: str
source: str
class FinalAnswer(BaseModel):
answer: str
citations: list[str] = Field(default_factory=list)
needs_human: bool = False
# -----------------------------
# Tools (small, testable units)
# -----------------------------
@retry(stop=stop_after_attempt(3), wait=wait_exponential_jitter(initial=0.2, max=2.0))
def fetch_billing_policy(query: str) -> ToolResult:
# Example tool: replace with your KB/search
base_url = os.getenv("POLICY_API_BASE_URL", "https://example.internal")
with httpx.Client(timeout=5.0) as client:
resp = client.get(f"{base_url}/policies/search", params={"q": query})
resp.raise_for_status()
data = resp.json()
return ToolResult(ok=True, summary=data["top_match"]["summary"], source=data["top_match"]["url"])
def fetch_runbook_snippet(query: str) -> ToolResult:
# Example tool: replace with your runbooks/CMDB
return ToolResult(ok=True, summary=f"Runbook hint for: {query}", source="runbook://oncall")
# -----------------------------
# Orchestration (the loop)
# -----------------------------
@dataclass(frozen=True)
class AgentConfig:
max_steps: int = 4
allow_write_actions: bool = False # start read-only
def route(ticket: Ticket) -> RoutedTask:
text = f"{ticket.subject}\n{ticket.body}".lower()
if any(k in text for k in ["invoice", "billing", "refund", "payment"]):
return RoutedTask(department="billing", reason="Billing keywords detected")
if any(k in text for k in ["error", "crash", "timeout", "bug", "api"]):
return RoutedTask(department="tech", reason="Technical issue keywords detected")
return RoutedTask(department="general", reason="Default route")
def verify(answer: str, tool_results: list[ToolResult]) -> tuple[bool, str]:
"""
Cheap deterministic verification:
- ensure we actually used tool evidence
- prevent risky claims when evidence is missing
"""
if not tool_results:
return False, "No tool evidence retrieved"
if len(answer.strip()) < 20:
return False, "Answer too short"
return True, "OK"
def handle_ticket(ticket: Ticket, cfg: AgentConfig = AgentConfig()) -> FinalAnswer:
routed = route(ticket)
log.info("routed_ticket", ticket_id=ticket.id, department=routed.department, reason=routed.reason)
tool_results: list[ToolResult] = []
for step in range(cfg.max_steps):
if routed.department == "billing":
tool_results.append(fetch_billing_policy(ticket.subject))
elif routed.department == "tech":
tool_results.append(fetch_runbook_snippet(ticket.subject))
else:
tool_results.append(fetch_runbook_snippet("support triage basics"))
# Minimal “answer synthesis” (swap with an LLM call later)
citations = [r.source for r in tool_results if r.ok]
answer = (
f"Here’s what I found for ticket {ticket.id}:\n"
f"- Route: {routed.department} ({routed.reason})\n"
f"- Evidence: {tool_results[-1].summary}\n"
f"Next step: confirm details and apply the documented policy/runbook."
)
ok, why = verify(answer, tool_results)
log.info("verify", ticket_id=ticket.id, ok=ok, why=why, step=step)
if ok:
return FinalAnswer(answer=answer, citations=citations, needs_human=False)
return FinalAnswer(
answer="I couldn’t verify a safe answer automatically. Please review the evidence and decide next steps.",
citations=[r.source for r in tool_results if r.ok],
needs_human=True,
)

Why this is a good baseline:

  • Small tools, easy to unit test
  • Typed request/response models (Pydantic)
  • Retries with backoff on flaky dependencies
  • A verify step (the “safety valve”)
  • Starts read-only by default

Best practices for messy, real-world enterprises (many silos)

This is where agent projects succeed or fail.

  • Don’t centralize everything first: connect sources incrementally, start with one domain.
  • Treat permissions as first-class: store ACLs and enforce at query time.
  • Prefer hybrid retrieval: lexical search + vector search + reranking.
  • Avoid “one giant agent”: use a router and specialists to reduce confusion.
  • Log decisions and tool calls: make failures debuggable.
  • Add budgets: time, tokens, tool calls, and dollars per workflow.
  • Make write actions rare: start read-only, then add approvals for writes.

Works cited


Tags

ai-agentsagent-architecturemulti-agentorchestrationazure-ai-foundrymicrosoft-agent-frameworkopenai

Share

Previous Article
Semantic Kernel and the Microsoft Agent Framework: what they are and how they power AI agents
Jhony Vidal

Jhony Vidal

Lead AI Engineer

Topics

AI Podcast
Data, AI & Automation
Research

Related Posts

Semantic Kernel and the Microsoft Agent Framework: what they are and how they power AI agents
September 27, 2025
9 min

Legal Stuff

Privacy NoticeCookie PolicyTerms Of Use

Social Media