Agent Infrastructure

The Control Plane Your Agent Is Missing

An agent gateway is the infrastructure layer between your agent and everything it talks to. This is Part 1 of a three-part series on the control plane your agent is missing.

Most builders hit the same wall at the same point. The agent works in testing. You deploy it. Two weeks later you're looking at an invoice you didn't expect, a model call that shouldn't have happened, and no way to reconstruct what the agent actually did. There's no log, no cost cap, nothing that stopped the runaway calls. The agent had no control plane.

An agent gateway is the infrastructure layer between your agent and everything it talks to: LLM providers, APIs, tools. Every call in and out passes through it. That single choke point is where you enforce policy, cap spend, log decisions, and handle failures cleanly. Without it, your agent is operating on trust: trust that the model won't loop, that costs won't spike, that nothing will go sideways in a way you can't see until it already has.

This is Part 1 of a three-part series. Today: what agent gateways are, what they actually do, and what's worth using right now.

Why This Exists Now

The problem is about 18 months old, which means the solutions are still consolidating. Two things made it concrete.

The first is MCP. The Model Context Protocol gave agents a structured, standard way to discover and invoke external tools via JSON-RPC calls. Before MCP, agent-to-tool connections were custom: each integration was its own code, its own auth, its own failure mode. Once MCP created a standardized call surface, it also created a standardized interception point. A gateway could now sit between the agent and all its tools without special-casing every integration.

The second is agent-to-agent (A2A) communication. As teams started building multi-agent systems where one agent delegates to another, the surface area for ungoverned calls exploded. Every agent-to-agent handoff is another call that needs auth, logging, and policy enforcement. A2A protocols gave structure to those handoffs. Gateways are the enforcement layer.

Neither protocol has fully settled. But the call-intercept point is well-defined enough now that production-grade gateways are viable and, for anything running in production, necessary.

What a Gateway Actually Does

The five core functions, in the order you'll encounter them as a builder.

Auth and routing come first. The gateway is the authentication boundary: agents present credentials to the gateway, the gateway presents its own credentials to downstream providers. Your model API keys live in the gateway, not in application code or agent memory. Routing adds model fallback: if your primary provider is down or rate-limited, the gateway reroutes to a backup without application-level changes.

Budget controls are second, and they're the one most builders configure last when they should configure first. Per-agent, per-team, per-day spend caps enforced at the gateway level. Hard limits stop runaway loops before they become expensive. Soft alerts at 80% give you time to respond before you hit the wall.

Policy enforcement is the content layer: filtering, PII masking, prompt injection detection. The gateway inspects requests and responses before they reach the model, and before the model's output reaches downstream systems. You define what's allowed; the gateway blocks what isn't.

Logging and audit trails are what make agent behavior reconstructible after the fact. Every request, every response, every tool call, every token count, captured and queryable. Without this, a production incident is a blank wall.

Failover and load balancing close the list. Automatic rerouting when a provider returns errors or degrades. For agents running long tasks where a mid-task provider outage would otherwise require restarting from scratch, gateway-level failover is the difference between a recoverable error and a failed run.

Landscape Snapshot

The space is still consolidating, and the right pick depends heavily on where you're starting from.

LiteLLM is the open-source default for teams that want to self-host. It supports 100+ providers through a single OpenAI-compatible interface, with virtual keys, per-team budget controls, and integrations with Langfuse and other observability tools. The limitation: it starts showing latency issues past 300 RPS in practice, and the enterprise tier locks some governance features behind paid licenses. If you're under 300 RPS and want to own your infrastructure, it's the starting point.

Cloudflare AI Gateway is zero-config if you're already in the Cloudflare ecosystem. Route API calls through a URL prefix and you get logging, rate limiting, caching, and basic observability with no infrastructure to manage. Tradeoffs: exact-match caching only, 10-50ms latency overhead that compounds in agentic chains, governance features that don't scale well for multi-team environments. Right fit for solo builders and small teams.

Kong AI Gateway is the enterprise choice if you're already a Kong shop. Deepest feature set: semantic caching, token-based rate limiting, PII sanitization, automated RAG injection, strong RBAC and audit tooling. Also the heaviest and most expensive: enterprise contracts start above $50K annually. Worth evaluating if you're running Kong for API management already.

Bifrost by Maxim is the one worth watching. Go-based, fully open-source, built from the ground up for AI workloads. At 5,000 RPS it adds 11 microseconds of overhead per request. Ships with semantic caching, virtual keys with hierarchical budget controls, and automatic failover in the open-source tier. The ecosystem is young but the architecture is right.

OpenRouter is the lightest entry point: a managed API routing to 500+ models across 60+ providers through a single endpoint. Automatic failover, unified billing, immediate multi-model access with no infrastructure. Cost is a 5.5% platform fee on top of model costs, no self-hosting, limited governance for multi-team environments. Useful for prototyping; wrong tool for anything needing policy enforcement or audit trails.

Builder Tip

Before you do anything else with your agent: add a spend cap.

Find the budget controls in whichever gateway you're using and set a hard limit per agent per day. Something conservative: $5 for a dev agent, $20 for a production agent handling real traffic. Set an alert at 80% of that limit. Put it in before the agent talks to anything.

Runaway agent loops are the most common early production failure. An agent that makes 100 calls when it should have made 10, because it hit an error condition and retried without backoff, will exhaust a daily budget in minutes. The gateway catches it. Without the gateway, you find out on the invoice.

Quick Hits

The "agent gateway" category name isn't fully settled yet. You'll also see these called LLM gateways, AI proxies, and model routers depending on which vendor is writing the documentation. They're mostly the same category.

Portkey is worth adding to the watchlist: strong compliance features (SOC2, GDPR, HIPAA) and PII detection that makes it the cleaner choice for regulated industries than most of the above.

MCP governance is the next frontier for most of these tools. Kong and LiteLLM have early MCP support; most others are still catching up. Part 2 of this series covers agent-to-tool architecture in depth, including where MCP governance fits.

The 10-50ms Cloudflare latency overhead sounds trivial until you chain 10 agent calls. That's 100-500ms of added latency per agentic task. For interactive applications, that matters.

Gartner has warned that a meaningful share of agentic AI projects will fail by 2027 due to poor governance and unclear ROI. A gateway that captures spend and audit data is the minimum viable governance layer.

This is Part 1 of 3. [Part 2: Your Gateway Is Configured. Is It Working?](/your-gateway-is-configured-is-it-working/) | [Part 3: When Your Gateway Has to Hold](/when-your-gateway-has-to-hold/)

Enjoying this? Subscribe to Gradient Push for practical AI and automation breakdowns — gradientpush.com