Running MCP Servers on Kubernetes
Summary
An MCP server is just an HTTP service — run it on Kubernetes as a Deployment and Service with probes, autoscaling, and Secrets Manager. The MCP-specific decisions explained.
Short answer: An MCP server is just an HTTP service, so on Kubernetes it is a normal Deployment + Service with health probes, autoscaling, and secrets — nothing exotic. The interesting parts are choosing the HTTP transport (not stdio) for cluster deployment, scaling stateless servers behind a Service, and keeping tool credentials in Secrets Manager.
Part 16 of the series. Previous: Kubernetes Concepts Every Staff Engineer Should Understand. For MCP fundamentals, see Model Context Protocol Tools, Resources, and Prompts Explained.
Introduction
If you have built an MCP server (see Building Your First MCP Server in C# and .NET), running it on Kubernetes is mostly applying the patterns from this series. This article highlights the few MCP-specific decisions.
The problem
A locally-run MCP server over stdio is fine for Cursor on your laptop. But an enterprise wants MCP capabilities available to many AI clients, with availability, scaling, and audit. Stdio does not fit a cluster; you need a networked, horizontally-scalable deployment.
Simple explanation
Treat the MCP server like any other API. Containerize it, run several replicas behind a Service, give it health probes so Kubernetes can restart and route around failures, and feed it credentials through Kubernetes secrets. The fact that its clients are AI agents does not change the operational model.
Official Kubernetes concept
The same objects you already know:
- Deployment for the stateless MCP server, with multiple replicas.
- Service (ClusterIP internally, or Ingress/LoadBalancer for external clients).
- Liveness/readiness probes so only healthy servers receive calls.
- HPA to scale replicas with demand.
- Secrets (via Secrets Manager + IRSA) for downstream API keys.
How it works
Run the MCP server with the HTTP transport so it is reachable over the network (stdio is for local single-process use — see Deploying MCP Servers: Stdio, HTTP, and Approval Gates). Keep the server stateless so any replica can handle any request; the Service load balances across them. The Pods reach finance backends (Portfolio, Trade, Risk) by their in-cluster Service names. Sensitive tool operations still go through approval gates and least-privilege credentials — Kubernetes does not replace those controls, it hosts them.
Finance example
A "portfolio insights" MCP server exposes a read-only `get_portfolio` resource and a `search_equities` tool. You run 3 replicas on EKS behind an internal Service. At a busy period the HPA scales to 8 replicas; overnight it drops back to 2. The broker and market-data keys come from Secrets Manager via IRSA — never baked into the image. If a replica crashes, the readiness probe pulls it from rotation and the Deployment replaces it, with no impact to connected agents.
C# example
The MCP server is a containerized ASP.NET host with a health endpoint:
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddMcpServer().WithHttpServerTransport();
var app = builder.Build();
app.MapGet("/healthz", () => Results.Ok()); // Kubernetes liveness/readiness
app.MapMcp(); // MCP HTTP endpoint
app.Run();spec:
replicas: 3
template:
spec:
containers:
- name: mcp-portfolio
image: <account>.dkr.ecr.us-east-1.amazonaws.com/mcp-portfolio:1.0.0
readinessProbe:
httpGet: { path: /healthz, port: 8080 }AWS example
On EKS: image in ECR, downstream keys in Secrets Manager via IRSA, external agent access through an ALB/Ingress with TLS, logs in CloudWatch. If different teams consume the server, NetworkPolicies and RBAC keep access scoped. This is the same platform you would build for any production .NET API.
Architecture diagram
Production reality
Running MCP servers for real, regulated workloads surfaces concerns demos never do:
- Stateless is a discipline, not a default. If a server caches session or auth state in memory, scaling to multiple replicas breaks subtly. Push shared state to Redis or a database so any replica can serve any request.
- The MCP server is a new, privileged attack surface. It can call your Trade and Portfolio backends. Lock it down: NetworkPolicies so only authorized clients reach it, least-privilege IRSA for downstream calls, and audit logging on every tool invocation.
- Approval gates live in the app, not the cluster. Kubernetes will happily run a tool that wires money. State-changing tools need human-in-the-loop approval and idempotency — see Deploying MCP Servers.
- Cost: an always-on MCP fleet is paid-for idle capacity overnight. Scale to a low floor off-hours and let the HPA grow it during the trading day.
- Observability: treat tool calls like API calls — structured logs, latency metrics, and rate limits — or you will not be able to explain what an agent did during an incident.
Interview questions
- Why not use stdio transport on Kubernetes? Stdio is for local single-process use; a cluster deployment needs the HTTP transport to be reachable and load balanced.
- What Kubernetes objects does an MCP server need? Typically a Deployment, a Service, probes, an HPA, and secrets — the same as any stateless API.
- How do you scale an MCP server? Keep it stateless and let an HPA add replicas behind a Service.
- Where do tool credentials live? In Kubernetes secrets sourced from Secrets Manager via IRSA, not in the image.
- Does Kubernetes replace MCP approval gates? No. It hosts the server; authorization and approval gates remain in the application layer.
Key takeaways
- An MCP server on Kubernetes is a normal Deployment + Service with probes and HPA.
- Use the HTTP transport, keep servers stateless, and scale horizontally.
- Source credentials from Secrets Manager via IRSA; keep approval gates in-app.
- Everything you learned in this series applies directly to AI infrastructure.
Next article
Next: How AI Agents Run on Kubernetes — the final article in the series. Previous: Kubernetes Concepts Every Staff Engineer Should Understand.
Frequently asked questions
- Why not use the stdio transport for MCP on Kubernetes?
- Stdio is for local single-process use. A cluster deployment needs the HTTP transport so the server is reachable over the network and load balanced across replicas.
- What Kubernetes objects does an MCP server need?
- Typically a Deployment, a Service, liveness/readiness probes, an HPA, and secrets — the same as any stateless API.
- Does Kubernetes replace MCP approval gates?
- No. Kubernetes hosts and scales the server, but authorization and approval gates remain in the application layer.
Related reading
How AI Agents Run on Kubernetes
AI agents run on Kubernetes as ordinary workloads — Deployments, Services, HPAs — plus queues for long jobs and GPU nodes for self-hosted models. With finance examples.
Deploying MCP Servers: Stdio, HTTP, and Approval Gates
Platform engineering guide to MCP transport choice — stdio for Cursor, HTTP for production containers — plus human-in-the-loop approval before state-changing tools.
Kubernetes Concepts Every Staff Engineer Should Understand
Beyond YAML: the reconciliation model, resource requests, failure design, security, scaling, and cost trade-offs that staff engineers are expected to reason about.