← Back to Writing
Article· 4 min read· Last updated

Running MCP Servers on Kubernetes

KubernetesMCP ServerAI Platform Engineering.NET AIAWS
Diagram of an AI client calling MCP server pods on Kubernetes that reach finance backends

Summary

An MCP server is just an HTTP service — run it on Kubernetes as a Deployment and Service with probes, autoscaling, and Secrets Manager. The MCP-specific decisions explained.

Short answer: An MCP server is just an HTTP service, so on Kubernetes it is a normal Deployment + Service with health probes, autoscaling, and secrets — nothing exotic. The interesting parts are choosing the HTTP transport (not stdio) for cluster deployment, scaling stateless servers behind a Service, and keeping tool credentials in Secrets Manager.

Part 16 of the series. Previous: Kubernetes Concepts Every Staff Engineer Should Understand. For MCP fundamentals, see Model Context Protocol Tools, Resources, and Prompts Explained.

Introduction

If you have built an MCP server (see Building Your First MCP Server in C# and .NET), running it on Kubernetes is mostly applying the patterns from this series. This article highlights the few MCP-specific decisions.

The problem

A locally-run MCP server over stdio is fine for Cursor on your laptop. But an enterprise wants MCP capabilities available to many AI clients, with availability, scaling, and audit. Stdio does not fit a cluster; you need a networked, horizontally-scalable deployment.

Simple explanation

Treat the MCP server like any other API. Containerize it, run several replicas behind a Service, give it health probes so Kubernetes can restart and route around failures, and feed it credentials through Kubernetes secrets. The fact that its clients are AI agents does not change the operational model.

Official Kubernetes concept

The same objects you already know:

  • Deployment for the stateless MCP server, with multiple replicas.
  • Service (ClusterIP internally, or Ingress/LoadBalancer for external clients).
  • Liveness/readiness probes so only healthy servers receive calls.
  • HPA to scale replicas with demand.
  • Secrets (via Secrets Manager + IRSA) for downstream API keys.

How it works

Run the MCP server with the HTTP transport so it is reachable over the network (stdio is for local single-process use — see Deploying MCP Servers: Stdio, HTTP, and Approval Gates). Keep the server stateless so any replica can handle any request; the Service load balances across them. The Pods reach finance backends (Portfolio, Trade, Risk) by their in-cluster Service names. Sensitive tool operations still go through approval gates and least-privilege credentials — Kubernetes does not replace those controls, it hosts them.

Finance example

A "portfolio insights" MCP server exposes a read-only `get_portfolio` resource and a `search_equities` tool. You run 3 replicas on EKS behind an internal Service. At a busy period the HPA scales to 8 replicas; overnight it drops back to 2. The broker and market-data keys come from Secrets Manager via IRSA — never baked into the image. If a replica crashes, the readiness probe pulls it from rotation and the Deployment replaces it, with no impact to connected agents.

C# example

The MCP server is a containerized ASP.NET host with a health endpoint:

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddMcpServer().WithHttpServerTransport();
var app = builder.Build();

app.MapGet("/healthz", () => Results.Ok());  // Kubernetes liveness/readiness
app.MapMcp();                                // MCP HTTP endpoint
app.Run();
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: mcp-portfolio
          image: <account>.dkr.ecr.us-east-1.amazonaws.com/mcp-portfolio:1.0.0
          readinessProbe:
            httpGet: { path: /healthz, port: 8080 }

AWS example

On EKS: image in ECR, downstream keys in Secrets Manager via IRSA, external agent access through an ALB/Ingress with TLS, logs in CloudWatch. If different teams consume the server, NetworkPolicies and RBAC keep access scoped. This is the same platform you would build for any production .NET API.

Architecture diagram

Production reality

Running MCP servers for real, regulated workloads surfaces concerns demos never do:

  • Stateless is a discipline, not a default. If a server caches session or auth state in memory, scaling to multiple replicas breaks subtly. Push shared state to Redis or a database so any replica can serve any request.
  • The MCP server is a new, privileged attack surface. It can call your Trade and Portfolio backends. Lock it down: NetworkPolicies so only authorized clients reach it, least-privilege IRSA for downstream calls, and audit logging on every tool invocation.
  • Approval gates live in the app, not the cluster. Kubernetes will happily run a tool that wires money. State-changing tools need human-in-the-loop approval and idempotency — see Deploying MCP Servers.
  • Cost: an always-on MCP fleet is paid-for idle capacity overnight. Scale to a low floor off-hours and let the HPA grow it during the trading day.
  • Observability: treat tool calls like API calls — structured logs, latency metrics, and rate limits — or you will not be able to explain what an agent did during an incident.

Interview questions

  • Why not use stdio transport on Kubernetes? Stdio is for local single-process use; a cluster deployment needs the HTTP transport to be reachable and load balanced.
  • What Kubernetes objects does an MCP server need? Typically a Deployment, a Service, probes, an HPA, and secrets — the same as any stateless API.
  • How do you scale an MCP server? Keep it stateless and let an HPA add replicas behind a Service.
  • Where do tool credentials live? In Kubernetes secrets sourced from Secrets Manager via IRSA, not in the image.
  • Does Kubernetes replace MCP approval gates? No. It hosts the server; authorization and approval gates remain in the application layer.

Key takeaways

  • An MCP server on Kubernetes is a normal Deployment + Service with probes and HPA.
  • Use the HTTP transport, keep servers stateless, and scale horizontally.
  • Source credentials from Secrets Manager via IRSA; keep approval gates in-app.
  • Everything you learned in this series applies directly to AI infrastructure.

Next article

Next: How AI Agents Run on Kubernetes — the final article in the series. Previous: Kubernetes Concepts Every Staff Engineer Should Understand.

Frequently asked questions

Why not use the stdio transport for MCP on Kubernetes?
Stdio is for local single-process use. A cluster deployment needs the HTTP transport so the server is reachable over the network and load balanced across replicas.
What Kubernetes objects does an MCP server need?
Typically a Deployment, a Service, liveness/readiness probes, an HPA, and secrets — the same as any stateless API.
Does Kubernetes replace MCP approval gates?
No. Kubernetes hosts and scales the server, but authorization and approval gates remain in the application layer.

Related reading