Agents are not “chatbots with a longer prompt”.
An agent is a system that can work toward a goal using a loop:
This post is a practical, easy read on:
flowchart TDclassDef step fill:#111827,stroke:#6366f1,color:#eef2ff;classDef tool fill:#0b1220,stroke:#38bdf8,color:#e0f2fe;classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;classDef out fill:#052e1a,stroke:#22c55e,color:#dcfce7;Goal["Goal / request"]:::step --> Plan["Plan"]:::stepPlan --> Act["Act (use tools)"]:::stepAct --> Check["Check result"]:::guardCheck -->|good| Done["Done"]:::outCheck -->|needs more| PlanCheck -->|risk/unclear| Human["Human review / handoff"]:::guard --> Donesubgraph Tools["Tools / data access"]DB["Datastores"]:::toolAPI["APIs"]:::toolSearch["Search / retrieval"]:::toolWorkflow["Workflows (Logic Apps, etc.)"]:::toolendAct --> DBAct --> APIAct --> SearchAct --> Workflow
If you build everything around this loop, you’ll avoid most “agent failures”:
Different tasks need different coordination. Here are the patterns you’ll see most often.
flowchart TBclassDef box fill:#0b1220,stroke:#334155,color:#e5e7eb;classDef hub fill:#111827,stroke:#6366f1,color:#eef2ff;classDef good fill:#052e1a,stroke:#22c55e,color:#dcfce7;subgraph Sequential["Sequential"]S1["Agent A"]:::box --> S2["Agent B"]:::box --> S3["Agent C"]:::boxendsubgraph Concurrent["Concurrent"]C0["Coordinator"]:::hub --> C1["Agent A"]:::boxC0 --> C2["Agent B"]:::boxC0 --> C3["Agent C"]:::boxC1 --> C0C2 --> C0C3 --> C0endsubgraph Handoff["Handoff"]H0["Router"]:::hub --> H1["Specialist A"]:::box --> H2["Specialist B"]:::box --> H3["Specialist C"]:::boxendsubgraph GroupChat["Group chat (debate + converge)"]G0["Facilitator"]:::hub --> G1["Worker"]:::boxG0 --> G2["Reviewer"]:::boxG1 --> G2 --> G0endsubgraph Workflow["Workflow process (DAG)"]W1["Step 1"]:::box --> W2["Step 2"]:::box --> W3["Step 3"]:::boxW2 --> W4["Parallel branch"]:::box --> W3endSequential --> Concurrent --> Handoff --> GroupChat --> Workflow
You can build agents purely in code, purely on a platform, or (most common) as a hybrid.
Microsoft describes Agent Framework as an open-source engine for building agentic apps, with durability, safety hooks, and paths to production. See the overview: Introducing Microsoft Agent Framework.
Azure AI Foundry provides a managed place to run agents with enterprise features (deployment, governance, identity, monitoring). Product entry point: Azure AI Foundry Agent Service.
For SDK and endpoints, see: Get started with Foundry SDKs and endpoints.
In many teams, the “agent architecture” stays the same, but you swap models based on:
Microsoft has positioned Foundry as a multi-model layer, including Anthropic Claude models in the catalog: Claude models in Microsoft Foundry.
If you prefer to own orchestration in your repo, you can build the loop in code and call models via the OpenAI SDK. In production, this usually means:
Best when you want a managed “run and operate” layer.
flowchart LRclassDef sys fill:#0b1220,stroke:#334155,color:#e5e7eb;classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;classDef out fill:#052e1a,stroke:#22c55e,color:#dcfce7;U["User / trigger"]:::step --> A["Foundry Agent Service"]:::stepA --> T["Built-in tools + connectors\n(files, search, workflows)"]:::sysA --> ID["Identity + RBAC\n(Entra, least privilege)"]:::guardA --> Obs["Tracing + logs + eval runs"]:::sysA --> H["Human approval gate\n(optional)"]:::guardH --> O["Action / answer"]:::outA --> O
Start with the SDK entry point: Foundry SDK overview.
Best when you want an SDK/workflow engine in your repo, but still want platform operations.
flowchart TBclassDef sys fill:#0b1220,stroke:#334155,color:#e5e7eb;classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;Repo["Your repo\n(agent code + tests + evals)"]:::sysAF["Agent Framework\n(router + specialists + reviewer)"]:::stepTools["Tools as code\n(OpenAPI clients, DB adapters)"]:::sysFoundry["Azure AI Foundry runtime\n(host + identity + monitoring)"]:::stepRepo --> AF --> ToolsAF --> Foundry --> Guard["RBAC + approvals + policies"]:::guard
Agent Framework overview: Microsoft Agent Framework blog.
Microsoft Learn also provides Agent Framework docs (example agent types): Azure AI Foundry agent type.
Best when you want maximum portability and you already have strong engineering around reliability.
flowchart LRclassDef sys fill:#0b1220,stroke:#334155,color:#e5e7eb;classDef step fill:#111827,stroke:#6366f1,color:#eef2ff;classDef guard fill:#2d1b0b,stroke:#f59e0b,color:#fffbeb;classDef out fill:#052e1a,stroke:#22c55e,color:#dcfce7;UI["API / UI"]:::sys --> Orchestrator["Your agent loop\n(plan → tool → check"]:::stepOrchestrator --> Model["Model via OpenAI SDK"]:::stepOrchestrator --> Tooling["Tools\n(DB, HTTP, search)"]:::sysOrchestrator --> Safety["Policies\nbudgets, allow-lists"]:::guardOrchestrator --> Result["Answer / action"]:::out
If you want a vendor-neutral convention for “agent instructions”, OpenAI has pushed AGENTS.md via the Agentic AI Foundation: OpenAI: Agentic AI Foundation.
This is a compact “production-shaped” baseline: typed inputs/outputs, one router, a couple tools, retries, and clear boundaries.
pip install "pydantic>=2.7" "httpx>=0.27" "tenacity>=8.2" "structlog>=24.1" "python-dotenv>=1.0"
Optional but common in real teams:
pip install "opentelemetry-api>=1.26" "opentelemetry-sdk>=1.26" "pytest>=8.0"
from __future__ import annotationsimport osfrom dataclasses import dataclassfrom typing import Literal, Optionalimport httpximport structlogfrom pydantic import BaseModel, Fieldfrom tenacity import retry, stop_after_attempt, wait_exponential_jitterlog = structlog.get_logger()# -----------------------------# Types (clear contracts)# -----------------------------Department = Literal["billing", "tech", "general"]class Ticket(BaseModel):id: strsubject: strbody: strcustomer_tier: Literal["free", "pro", "enterprise"] = "free"class RoutedTask(BaseModel):department: Departmentreason: str = Field(..., max_length=240)class ToolResult(BaseModel):ok: boolsummary: strsource: strclass FinalAnswer(BaseModel):answer: strcitations: list[str] = Field(default_factory=list)needs_human: bool = False# -----------------------------# Tools (small, testable units)# -----------------------------@retry(stop=stop_after_attempt(3), wait=wait_exponential_jitter(initial=0.2, max=2.0))def fetch_billing_policy(query: str) -> ToolResult:# Example tool: replace with your KB/searchbase_url = os.getenv("POLICY_API_BASE_URL", "https://example.internal")with httpx.Client(timeout=5.0) as client:resp = client.get(f"{base_url}/policies/search", params={"q": query})resp.raise_for_status()data = resp.json()return ToolResult(ok=True, summary=data["top_match"]["summary"], source=data["top_match"]["url"])def fetch_runbook_snippet(query: str) -> ToolResult:# Example tool: replace with your runbooks/CMDBreturn ToolResult(ok=True, summary=f"Runbook hint for: {query}", source="runbook://oncall")# -----------------------------# Orchestration (the loop)# -----------------------------@dataclass(frozen=True)class AgentConfig:max_steps: int = 4allow_write_actions: bool = False # start read-onlydef route(ticket: Ticket) -> RoutedTask:text = f"{ticket.subject}\n{ticket.body}".lower()if any(k in text for k in ["invoice", "billing", "refund", "payment"]):return RoutedTask(department="billing", reason="Billing keywords detected")if any(k in text for k in ["error", "crash", "timeout", "bug", "api"]):return RoutedTask(department="tech", reason="Technical issue keywords detected")return RoutedTask(department="general", reason="Default route")def verify(answer: str, tool_results: list[ToolResult]) -> tuple[bool, str]:"""Cheap deterministic verification:- ensure we actually used tool evidence- prevent risky claims when evidence is missing"""if not tool_results:return False, "No tool evidence retrieved"if len(answer.strip()) < 20:return False, "Answer too short"return True, "OK"def handle_ticket(ticket: Ticket, cfg: AgentConfig = AgentConfig()) -> FinalAnswer:routed = route(ticket)log.info("routed_ticket", ticket_id=ticket.id, department=routed.department, reason=routed.reason)tool_results: list[ToolResult] = []for step in range(cfg.max_steps):if routed.department == "billing":tool_results.append(fetch_billing_policy(ticket.subject))elif routed.department == "tech":tool_results.append(fetch_runbook_snippet(ticket.subject))else:tool_results.append(fetch_runbook_snippet("support triage basics"))# Minimal “answer synthesis” (swap with an LLM call later)citations = [r.source for r in tool_results if r.ok]answer = (f"Here’s what I found for ticket {ticket.id}:\n"f"- Route: {routed.department} ({routed.reason})\n"f"- Evidence: {tool_results[-1].summary}\n"f"Next step: confirm details and apply the documented policy/runbook.")ok, why = verify(answer, tool_results)log.info("verify", ticket_id=ticket.id, ok=ok, why=why, step=step)if ok:return FinalAnswer(answer=answer, citations=citations, needs_human=False)return FinalAnswer(answer="I couldn’t verify a safe answer automatically. Please review the evidence and decide next steps.",citations=[r.source for r in tool_results if r.ok],needs_human=True,)
Why this is a good baseline:
This is where agent projects succeed or fail.
Legal Stuff
