Semantic Kernel and the Microsoft Agent Framework: what they are and how they power AI agents

September 27, 2025
9 min read
Semantic Kernel and the Microsoft Agent Framework: what they are and how they power AI agents

Why this matters

Note: This post was updated to include the Microsoft Agent Framework and how it relates to Semantic Kernel and Azure AI Foundry.

If you’re building apps with large language models, you’ll quickly need a way to organise prompts, call tools, track state, and work with more than one “agent”. Semantic Kernel (SK) is a practical SDK from Microsoft that helps you do exactly that.

Below is a quick, hands-on guide.


How Semantic Kernel is structured (at a glance)

External systems
Your app
uses
calls
APIs (OpenAPI)
Datastores
App code
Agent (optional)
(single or multi-agent)
Kernel
Model services
(chat + embeddings)
Tools / plugins
(code + prompts)
State & memory
(DB / vector store)
Processes / workflows
(optional)

More context: why Semantic Kernel, and what else could you use?

If you just want to call a single model with a fixed prompt, you don’t need a framework. But as soon as you add tools, multiple roles, state, and safety checks, an orchestration layer helps.

Why Semantic Kernel

SemanticKernel

  • First‑class tool calling (functions as code or prompts)
  • Simple plugin model; easy to wrap your own APIs
  • Works with OpenAI and Azure OpenAI out of the box
  • Import OpenAPI specs to turn HTTP APIs into tools quickly
  • Designed to be used as “library code” inside your app (not a hosted platform)

Other options (at a glance)

  • LangChain: very rich ecosystem of integrations and chains; heavier abstractions
  • LlamaIndex: great for retrieval/RAG pipelines and document loaders
  • AutoGen: focus on multi‑agent conversations with explicit role scripting
  • OpenAI Assistants: hosted orchestration with tools and vector store built‑in
  • CrewAI/LangGraph: graph‑style workflows and agent teams

Where the Microsoft Agent Framework and Azure AI Foundry fit

There are two separate questions people often mix together:

  • How do I write the agent logic (tools, routing, orchestration)?
  • Where do I run and operate the agent (deployment, auth, tracing, evaluation)?

Semantic Kernel mostly answers the first question.

Azure AI Foundry is closer to the second question: models, agent runtimes/services, evaluation, and operational tooling.

The Microsoft Agent Framework sits in the same “agent logic” space as Semantic Kernel, but it focuses on providing a consistent way to build agent applications, patterns, and samples. In practice, teams often mix these:

  • Use an SDK (Semantic Kernel or Agent Framework) for orchestration in code
  • Use Azure AI Foundry for model deployments, evaluation, and operations

High-level map:

Azure AI Foundry (platform)
Your codebase (SDKs)
Model deployments
(OpenAI / Azure OpenAI)
Agent services/runtime
Evaluation & safety checks
Tracing/monitoring
Semantic Kernel
(plugins, agents, orchestration)
Microsoft Agent Framework
(app patterns, samples, orchestration)

What changed in 2025 (and why it matters in practice)

If you learned SK earlier and haven’t looked at it in a while, a few things are worth knowing:

Why Microsoft introduced the Agent Framework (alongside Semantic Kernel)

The simplest way to think about it: Semantic Kernel is an SDK you use inside your app to connect models to tools and orchestrate work. The Microsoft Agent Framework is a more opinionated set of agent application patterns and samples.

Microsoft’s own SK roadmap also signals a broader direction: integrations across agent runtimes and frameworks, and smoother interoperability (for example with services such as Azure AI Foundry).
https://devblogs.microsoft.com/semantic-kernel/semantic-kernel-roadmap-h1-2025-accelerating-agents-processes-and-integration/

Technical angle (ML/AI/data)

  • Tool calling: expose capabilities with clear schemas. Models decide when to call them
  • Structured outputs: prefer JSON schemas or typed validators (e.g. Pydantic) to keep outputs reliable
  • State: keep turn history short and persist long‑term facts in a store (DB, vector index)
  • Observability: log prompts, tool calls, latency, and cost per turn. Needed for debugging and safety

1) Semantic Kernel fundamentals

At heart, SK gives you:

  • A Kernel: where you plug in your model(s) and your tools (called plugins)
  • Functions: either prompt-based or “native” code that the model can call
  • Memory and planners: optional helpers for context and task breakdown

Think of the Kernel as your app’s brain. You register a chat model, add a few useful functions (e.g. search, maths, time), and then ask the model to solve tasks using those functions.

Minimal set‑up (Python) — banking assistant

import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import (
AzureChatCompletion, OpenAIChatCompletion,
)
from semantic_kernel.contents import ChatHistory
kernel = Kernel()
# Pick one provider
# kernel.add_service(OpenAIChatCompletion(ai_model_id="gpt-4o", api_key=os.environ["OPENAI_API_KEY"]))
kernel.add_service(
AzureChatCompletion(
deployment_name="gpt-4o",
endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_API_KEY"],
)
)
history = ChatHistory()
history.add_user_message("Hello, I’d like to check my account options.")
# Ask the default chat service to respond
chat = kernel.get_service(AzureChatCompletion) # or OpenAIChatCompletion
reply = await chat.get_chat_message_content(history)
print(reply)

Notes

  • Keep secrets in env vars: AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY (or OPENAI_API_KEY).
  • For banking assistants, avoid echoing personal data back to the user.

2) Plugins and “auto function calling”

Plugins are simply collections of functions. A function can be:

  • Native: plain Python function (great for calling an API, reading a file, etc.)
  • Prompt: an LLM prompt wrapped as a callable function

With tool/function calling, the model can decide when to call your functions. “Auto function calling” lets the model pick and run the right function by itself.

What’s happening?

  • We declared the tool functions with @kernel_function.
  • We turned on automatic tool selection with tool_choice="auto", so the model can call them when needed.

Example: a tiny native plugin (Python) — banking tools

import os
from typing import List
from dataclasses import dataclass
from semantic_kernel import Kernel
from semantic_kernel.functions import kernel_function
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.contents import ChatHistory
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
# A fake in-memory ledger just for demo purposes
_ACCOUNTS = {
"ACC123": {"currency": "GBP", "balance": 1284.55, "owner": "J. Doe"},
"ACC456": {"currency": "GBP", "balance": 72.10, "owner": "J. Doe"},
}
_TX = {
"ACC123": [
{"date": "2025-09-20", "desc": "Coffee", "amount": -3.4},
{"date": "2025-09-19", "desc": "Salary", "amount": 2100.0},
],
"ACC456": [
{"date": "2025-09-21", "desc": "Transport", "amount": -6.2},
],
}
class BankingPlugin:
@kernel_function(description="Get the available balance for an account id")
def get_balance(self, account_id: str) -> str:
if account_id not in _ACCOUNTS:
return "Account not found"
acct = _ACCOUNTS[account_id]
return f"{acct['balance']} {acct['currency']}"
@kernel_function(description="List the last N transactions for an account id")
def list_transactions(self, account_id: str, n: int = 5) -> str:
items = _TX.get(account_id, [])[:n]
if not items:
return "No transactions found"
lines = [f"{t['date']} | {t['desc']} | {t['amount']:+.2f}" for t in items]
return "\n".join(lines)
kernel = Kernel()
kernel.add_service(
AzureChatCompletion(
deployment_name="gpt-4o",
endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_API_KEY"],
)
)
# Register our banking plugin under a friendly name
kernel.add_plugin(BankingPlugin(), plugin_name="bank")
history = ChatHistory()
history.add_system_message(
"You are a helpful banking assistant. Use tools to answer precisely."
)
history.add_user_message("What’s the balance of account ACC123, and show my last transaction?")
# Enable auto function calling so the model can call `bank.get_balance` and `bank.list_transactions`
settings = OpenAIChatPromptExecutionSettings(tool_choice="auto")
chat = kernel.get_service(AzureChatCompletion)
reply = await chat.get_chat_message_content(history, settings)
print(reply)

What’s happening?

  • We declared two tool functions with @kernel_function.
  • We turned on automatic tool selection with tool_choice="auto", so the model can call them when it needs to.

3) Import an API using an OpenAPI spec

You don’t need to hand‑code every client. Given an OpenAPI (Swagger) document, SK can import it as a plugin so the model can call those endpoints.

Tips

  • Start by importing read‑only endpoints; then add writes if you trust the agent.
  • Describe the plugin purpose in your system message so the model knows when to use it.

Example: import an OpenAPI plugin (Python) — core banking API

import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.contents import ChatHistory
# In SK Python, OpenAPI import helpers are available under the OpenAPI connector
from semantic_kernel.connectors.openapi import OpenAPIPlugin
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
kernel = Kernel()
kernel.add_service(
AzureChatCompletion(
deployment_name="gpt-4o",
endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_API_KEY"],
)
)
# Import from a URL (can also import from a local YAML/JSON file path)
corebank = await OpenAPIPlugin.from_url("https://api.examplebank.com/openapi.yaml")
kernel.add_plugin(corebank, plugin_name="corebank")
history = ChatHistory()
history.add_system_message(
"You can call 'corebank' to fetch balances and transactions. Avoid returning raw PII."
)
history.add_user_message("Show my latest 2 transactions for ACC123, please.")
settings = OpenAIChatPromptExecutionSettings(tool_choice="auto")
chat = kernel.get_service(AzureChatCompletion)
answer = await chat.get_chat_message_content(history, settings)
print(answer)

Notes

  • Start with endpoints like GET /accounts/{id} and GET /accounts/{id}/transactions.
  • Add guardrails in your system message (e.g. don’t expose full PANs; redact PII).

4) A simple multi‑agent conversation (with Azure AI Foundry)

Often, you’ll want more than one agent. For example: a Researcher that gathers facts, and a Writer that turns those facts into a short post. Azure AI Foundry supplies the model and deployment; SK coordinates the chat.

This is deliberately simple: no memory store, no planner, just roles handing text back and forth. You can grow it by adding plugins (search, data), guardrails, or a router that decides who speaks next.

A simple multi‑agent banking assistant (Python)

Roles

  • Orchestrator: decides which specialist should act.
  • Teller: answers balance/transactions using the banking plugin.
  • Risk: flags unusual patterns and suggests limits.
  • Compliance: checks responses for sensitive data before sending to user.
import os
from semantic_kernel import Kernel
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
from semantic_kernel.contents import ChatHistory
from semantic_kernel.connectors.ai.open_ai import OpenAIChatPromptExecutionSettings
# Reuse the BankingPlugin from earlier
from typing import Optional
kernel = Kernel()
kernel.add_service(
AzureChatCompletion(
deployment_name="gpt-4o",
endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ["AZURE_OPENAI_API_KEY"],
)
)
kernel.add_plugin(BankingPlugin(), plugin_name="bank")
chat = kernel.get_service(AzureChatCompletion)
auto = OpenAIChatPromptExecutionSettings(tool_choice="auto")
# Define specialist system prompts
teller_sys = "You are a bank teller. Answer with concise facts. Use tools to get balances and transactions."
risk_sys = "You are a risk analyst. Identify anomalies or affordability risks. Keep it factual."
comp_sys = "You are compliance. Redact PII (card numbers, full addresses). Ensure tone is professional and warm."
orch_sys = "You decide which specialist acts next based on the user request and conversation state. Return one of: TELLER, RISK, COMPLIANCE."
teller = ChatHistory(); teller.add_system_message(teller_sys)
risk = ChatHistory(); risk.add_system_message(risk_sys)
comp = ChatHistory(); comp.add_system_message(comp_sys)
orch = ChatHistory(); orch.add_system_message(orch_sys)
user_query = "Could you show the last two transactions for ACC123 and check if there’s anything unusual?"
orch.add_user_message(f"User asked: {user_query}\nConversation just started.")
final_answer: Optional[str] = None
for turn in range(4):
step = await chat.get_chat_message_content(orch)
decision = (step.content or "").strip().upper()
if "TELLER" in decision:
teller.add_user_message(user_query)
teller_answer = await chat.get_chat_message_content(teller, auto)
orch.add_user_message(f"Teller replied: {teller_answer}")
elif "RISK" in decision:
risk.add_user_message(f"Data to assess: {teller_answer}")
risk_answer = await chat.get_chat_message_content(risk)
orch.add_user_message(f"Risk replied: {risk_answer}")
elif "COMPLIANCE" in decision:
comp.add_user_message(
f"Draft to review and redact if needed.\n{teller_answer}\n{risk_answer if 'risk_answer' in locals() else ''}"
)
comp_answer = await chat.get_chat_message_content(comp)
final_answer = comp_answer.content
break
else:
# Fallback to teller if unsure
teller.add_user_message(user_query)
teller_answer = await chat.get_chat_message_content(teller, auto)
orch.add_user_message(f"Teller (fallback) replied: {teller_answer}")
print("\nFinal (to user):\n", final_answer or teller_answer)

This pattern is pragmatic and extendable. Add routing rules, memory, or an approval step before the final message is sent.

Agent orchestration: who talks to whom?

BankingPluginComplianceRiskTellerOrchestratorUserBankingPluginComplianceRiskTellerOrchestratorUserRequest: last 2 tx + unusual activity checkRoute requestget_balance(ACC123), list_transactions(ACC123,2)balances + tx listDraft answer with facts"Assess anomalies"Risk notes"Review and redact PII"Final compliant answerReply

Agent orchestration patterns (quick primer)

Patterns you’ll see in practice. Pick the simplest that solves your need:

  • Router (dispatcher): one orchestrator routes turns to the right specialist. Great default for banking assistants.
  • Supervisor–worker (hub and spoke): a manager assigns tasks to workers and reviews outputs.
  • Plan and execute: a planner drafts steps; an executor runs them (often calling tools) and reports back.
  • Critic/editor (debate/reflect): a “writer” drafts, a “critic” reviews, possibly a judge picks a final.
  • Blackboard (shared memory): agents read/write to a common scratchpad and act when relevant facts appear.
  • Graph/DAG workflow: deterministic nodes with guards and retries (nice for approvals and audits).
Review (optional)
Critic
Supervisor
Router / Orchestrator
Teller
Risk
Compliance
Shared memory / state

Banking safety tips

  • Treat all account identifiers as sensitive; avoid printing full names or IDs.
  • Keep tools read‑only at first. Introduce money‑moving endpoints only with strong safeguards.
  • Log tool calls and redact logs.

Practical notes

  • Keep prompts short and specific. State the role and how the agent should answer.
  • Prefer native plugins for anything that touches your systems (APIs, files, DBs).
  • Start with read‑only permissions. Add writes only after you trust behaviour.
  • Log tool calls and responses for debugging and safety.

Using SK with MCP and Azure services

Model Context Protocol (MCP) defines a standard way for tools (“resources” and “capabilities”) to be discovered and called by models/agents. You can expose your internal systems as MCP servers and let SK (or another orchestrator) call them via a thin client plugin.

How it fits

  • MCP server: wraps a capability (e.g. Accounts API, Payments API)
  • SK plugin: a client that forwards tool calls to the MCP server
  • Agent: decides when to call which MCP tool via SK’s auto function calling

Azure examples

  • Azure Key Vault: credentials/secret access for downstream tools
  • Azure Functions/APIM: host MCP servers or REST endpoints for bank services
  • Azure Cognitive Search: document and product catalogue search as a tool
  • Azure Storage/Table/SQL: state and logs for conversations and tool outputs
  • Azure Event Grid/Service Bus: async workflows (e.g. payment approvals)

Sketch (Python)

# Pseudo-client that forwards calls to an MCP server
class MCPClientPlugin:
@kernel_function(description="Run an MCP tool by name with JSON args")
def call(self, tool: str, args_json: str) -> str:
# send to MCP server (over stdio/websocket/http depending on your setup)
# return JSON result string for the model
...
kernel.add_plugin(MCPClientPlugin(), plugin_name="mcp")
# Now the agent can do: mcp.call(tool="accounts.get_balance", args_json="{...}")

Tip: use Azure API Management to front internal services with policies (auth, quotas, masking) before exposing them to agents.


Where RAG fits (and alternatives)

Retrieval‑Augmented Generation (RAG) fetches relevant context from a store (often a vector index) and feeds it to the model. With SK you can:

  • Add a retrieval tool: that queries Azure Cognitive Search or a vector DB
  • Keep answers grounded: cite sources in tool outputs and prompts
  • Trim history: rely on retrieval instead of long chat context

Alternatives/complements

  • Function/tool calling only: if data is in structured systems, skip vectors and query APIs directly
  • Fine‑tuning: train a smaller domain model for style/format; still combine with tools
  • Structured pipelines: use LangGraph/CrewAI for complex branching; call SK tools from nodes

When to prefer RAG

  • You have lots of unstructured text (policies, product sheets)
  • You need citations and up‑to‑date facts
  • You want to minimise hallucinations without heavy fine‑tuning

Visual comparison: RAG vs fine‑tuning vs tools

RAG

User question
Retriever
Context chunks
Generator
Answer + citations

Fine‑tuning

Domain data
Fine-tune base model
Specialised model
User question
Answer in trained style

Tool‑calling only (no vector store)

User question
Orchestrator
Tool: API
Tool: DB query
Grounded answer

Evaluation and benchmarking

Why: agent systems evolve. You need repeatable checks for accuracy, safety, latency and cost.

What to measure

  • Task success rate: does the agent produce the expected structured output?
  • Groundedness: are claims supported by retrieved or tool data?
  • Safety/PII: no leakage of sensitive fields
  • Latency and cost: per turn and end‑to‑end

Useful tools

  • OpenAI Evals / custom judge prompts for pairwise comparisons
  • Ragas / Ragas‑like metrics for RAG (context precision/recall, faithfulness)
  • DeepEval, Promptfoo: define tests as YAML and run in CI
  • Azure AI Studio evals: managed runs and dashboards
  • Tracing: OpenTelemetry, Arize/Artemis, Langfuse for spans and prompt/tool logs

How to run it 1) Define tasks and gold outputs (or a judge prompt) for your banking flows 2) Build a skinny harness that calls your SK agents with fixed seeds and inputs 3) Record tool calls, deltas to gold, and judge scores 4) Fail the build on regressions (accuracy drops, safety violations)

Where it sits in the workflow

  • Dev: unit tests for tools and prompts; focused golden sets
  • Pre‑prod: sandbox end‑to‑end evals with synthetic and real‑like data
  • Prod: shadow evaluation and tracing, weekly scorecards

Machine Learning workflow: 7 stages and 6 practical steps

Seven stages (simple view)

machine learning workflow cycle

  1. Problem definition — what’s the outcome and constraints?
  2. Data collection — sources, access, consent
  3. Data preparation — cleaning, joins, labelling
  4. Data visualisation — explore patterns and leakage risks
  5. ML modelling — pick a baseline and iterate
  6. Feature engineering — derive signals from raw data
  7. Model deployment — ship, observe, and improve

Six practical steps we’ll actually run

achieving ml success

  1. Problem definition — turn the business question into an ML/agent task and success metric
  2. Data — list what you have (structured/unstructured; batch/stream); map to the task
  3. Evaluation — choose a metric and sign‑off threshold (e.g. 95% exact match on statements)
  4. Features — decide which fields matter; add derived features carefully
  5. Modelling — pick a model or agent pattern; compare baselines vs tool‑augmented agents
  6. Experimentation — try variants, measure, and feed results back into the loop

How agents and SK fit

  • SK handles orchestration, tool calling, and structured outputs.
  • Retrieval (RAG) supplies fresh facts; tools fetch system truth; outputs are validated.
  • You still need a proper evaluation loop and deployment hygiene around the agent.

Azure mapping (one way to wire it)

Orchestration
Model deployment
Feature engineering
ML modelling
Data visualisation
Data preparation
Data collection
Problem definition
Semantic Kernel
Plugins/OpenAPI
Cognitive Search
Container Apps or AKS
API Management
Key Vault
Monitor/Insights
Feature Store (AML or Databricks)
Azure Machine Learning
Azure AI Foundry
model deployments
Power BI
Databricks
Azure Databricks
Azure Synapse
AML Pipelines
Event Hubs
Data Factory
ADLS Gen2 or Blob Storage
Azure Boards or design docs
Azure AI Foundry
(prompt flow / evaluation)

Guardrails, validation, and traceability

ai application safeguards

Guardrails

  • System prompts that forbid PII echoing, risky actions and unapproved tools
  • Tool scopes: start read‑only; require approvals for payments or data export
  • Rate limits/quotas: per user, per agent, per tool

Validation

  • Pydantic/typed schemas for tool inputs and model outputs
  • Redaction filters for logs and responses (e.g. mask IBAN/PAN patterns)
  • Policy checks: allow/deny lists by user role and time of day

Traceability

  • Correlate every turn with a trace id; include model, temperature, tool list
  • Store prompts, tool I/O, and decisions; keep only what you must, hashed where needed
  • Add human‑in‑the‑loop approvals for sensitive actions (limit increases, transfers)

Best practices, challenges, and tips

Best practices

how to optimise ai development

  • Keep prompts short; prefer explicit formats and JSON outputs
  • Treat tools like APIs: version, monitor, and test them
  • Separate orchestration from business logic; keep plugins focused
  • Log everything important; sample if needed to control cost

Common challenges

  • Hallucinations when no tool fits: return “I don’t know” with a helpful next step
  • Context bloat: use retrieval and summaries; trim aggressively
  • Flaky tool calling: add retries with jitter and idempotency keys
  • Compliance: design for redaction and approvals from the start

Tips

  • Start with one or two high‑value tools and one agent role; grow from there
  • Use staging sandboxes and synthetic accounts to test safely
  • Prefer Azure services you already trust for identity, secrets, and networking

Keep exploring

Videos to watch

If you want a walkthrough with visuals, start here:

And this one focuses on differences in approaches:

Azure AI Foundry vs Microsoft Agent Framework (practical comparison)

This is not a “winner” table. It’s a choice of where you want the complexity to live.

AreaAzure AI FoundryMicrosoft Agent Framework
Primary jobPlatform for deploying/operating AI apps and agentsSDK/patterns for building agent apps in code
Where your agent logic livesOften split: some logic in code, some in managed services/configMostly in your repo (code-first), with your own tests and CI
Model accessManaged model deployments and governance (platform-led)You bring the model client and credentials (code-led)
Tools/integrationsFirst-class platform connectors + managed patternsYou implement tools as code, wrap APIs, and choose your own integrations
EvaluationBuilt for eval runs, dashboards, and safety checksUsually you wire your own eval harness (or use platform tools)
ObservabilityPlatform-level tracing/monitoring integrationsYou choose tracing/logging (OpenTelemetry, vendor tools, etc.)
Best whenYou want a managed path from prototype to operated agentYou want maximum control, portability, and code ownership
Trade-offsYou adopt platform conventions and servicesYou own more operational wiring unless you add a platform layer

In many real systems:

  • Foundry is the operational layer (models, evaluation, monitoring)
  • Agent Framework and/or Semantic Kernel are the orchestration layer (tools, routing, multi-agent logic)

Share

Trailblazer Garage - AI, Data, and Automation Insights
© 2026, All Rights Reserved.