AI gateways have quickly become the front door to enterprise AI traffic. They sit between users, agents, and the LLMs, tools, and MCP servers behind them, handling routing, rate limits, caching, and centralized usage reporting.
What they don't handle well is authorization. Identity gets verified at the door, but the decision about what that identity is allowed to do once the request is through usually lives further inside the stack, in application code or agent logic, outside the systems your identity and security teams own.
That gap is what fine-grained authorization closes. This guide walks through why AI gateways need it, what it looks like in practice, and how Cerbos provides the policy layer that runs in front of every AI request your gateway routes.
TL;DR: AI gateways solve the routing problem. They don't solve the authorization problem. Here's what fine-grained authorization changes for every AI request your gateway sees.
Why AI gateways need fine-grained authorization
AI gateways unify access to LLM providers behind a single endpoint. They authenticate every request, verify tokens and claims, and pass the caller's identity through to downstream routing. On top of that, they add semantic caching, token-based rate limits, prompt guards, centralized usage and cost reporting, and - increasingly - model allowlists, tool filtering, MCP proxying with per-method ACLs, and inline if/else rules.
The identity is there. The authorization engine is not. The gateway knows who the caller is, but its built-in authorization primitives cannot reason across the caller's attributes, the resource's attributes, and the context of the request. They can say "this user can call this model", not "this user can call this model only on resources their team owns, only under this amount, and only in this environment".
- Authorization primitives are coarse. Allowlists, regex routes, and inline rules can enforce role- or claim-based decisions but cannot express the attribute-based, relationship-aware policies that AI workloads actually need.
- Impersonation instead of delegation. Agents often act as the user rather than on behalf of the user. The principal chain is flattened: audit logs record the action as the user's, the agent disappears from access reviews, and the delegation that justified the action is not captured.
- Delegation without attenuation. When an agent delegates to another agent, tokens are passed downstream unchanged. A sub-agent inherits the full authority of its parent and can do anything the parent could, even when the task justified only a narrower grant.
- Resource and workflow context are missing. The gateway sees the identity and the request payload. It does not know which tenant owns the resource, which data classification applies, which workflow state is current, or what the caller's per-action limits are. Those attributes have to be fetched and combined with the identity to produce a useful decision.
- Authorization drifts into agent code. When the gateway cannot express the rule, fine-grained logic lands inside agents, MCP servers, and downstream services. Decisions are not centralized, not independently testable, and not consistent across systems.
What's missing is a policy layer the gateway can call on every AI request. That's what Cerbos provides.
How Cerbos adds fine-grained authorization to your AI gateway
Cerbos extends the AI gateway with fine-grained, contextual authorization. The gateway routes traffic. Cerbos decides which traffic is allowed.
- Consistent access control everywhere. The same policies govern every model, tool, MCP method, and agent-to-agent call the gateway routes, and extend to the APIs and data systems behind it.
- Powerful policy lifecycle management. Define, test, and validate authorization policies before deployment. Manage them through Git and CI/CD. Roll out updates to all enforcement points without redeploying agents or gateway configuration.
- Zero Trust at runtime. Every AI request is authorized at request time against fresh context. Prior decisions are not reused as implicit grants. Revoked access takes effect on the next call.
- Compliance-ready visibility. Every allow or deny decision is recorded with the full request context and the policy version that produced it, so AI authorization behavior is fully traceable across all identities.
How AI gateway authorization works with Cerbos
Identity systems establish who someone is. Cerbos decides what they are allowed to do, at the moment the gateway enforces it.
- Every action is checked. When a user invokes an agent, an agent calls a tool, or a service requests a model completion, the gateway sends the request to Cerbos before routing it upstream. The gateway acts as the Policy Enforcement Point through its native authorization hook, a pre-request plugin, or an external authorization filter.
- The request is evaluated. Cerbos reviews who is making the request, which agent or service they are acting through, what they want to do, which model or tool they want to use, and the surrounding context - resource attributes, data classification, environment, and delegation chain.
- The decision is enforced. Cerbos returns ALLOW or DENY, and optionally why to inform the agent. The gateway enforces the decision before the call reaches the provider. For high-capability agents, the gateway is configured to fail closed when the PDP is unreachable, with a defined safe-degradation path.
AI gateway authorization policies in practice
Cerbos policies are attribute-based. Decisions depend on the attributes of the principal, the resource, and the context of the request.
Scenario 1: Role-based model access
The frontier reasoning model is callable only by principals in the research or platform-ai groups, and only from production environments. This scopes the cost and compliance footprint of high-capability models to approved teams.
Scenario 2: Data-residency routing
Requests from EU-resident users, or requests carrying EU-classified data, may only reach LLM endpoints hosted in approved EU regions.Regulated data is demonstrably constrained to approved geographies and providers.
Scenario 3: Tool-level allowlists per agent
An agent acting for a support user can call lookup_order and create_ticket. The issue_refund tool is enabled only when the user holds a refund_approver role and the amount is under their approval limit. Spending controls and business policies are enforced at the gateway, not reconstructed in each agent.
Scenario 4: Data-driven tool access
The edit_project_plan tool is enabled only when the caller's teamID matches the project's owningTeamID. Cross-team access through agents is prevented at decision time.
Scenario 5: Dynamic MCP tool discovery
MCP tool discovery returns a reduced catalog for low-privilege users and the full catalog for high-privilege users. Destructive tools such as drop_table or delete_tenant appear only for on-call engineers in a break-glass group. Agents see only the tools the principal is authorized to invoke. "Permission denied" failures at runtime are reduced.
Scenario 6: Attenuated agent-to-agent delegation
When a planner agent delegates to an execution agent, Cerbos verifies that the delegated scope is a strict subset of the planner's own authority and the delegating user's permissions, and that sub-delegation was explicitly granted in the original call. Sub-agents cannot re-expand their authority. The chain of grants stays provably inside the permissions the human originally authorized.
What AI gateway authorization can control
With Cerbos in front of the gateway, you control:
- Which LLMs and providers each principal may call. Controls include per-role allowlists, cost-tier gating, environment separation, and data-residency routing.
- Which tool calls and MCP methods are allowed, conditioned on the caller's role, the resource, and the business context of the request.
- How agents delegate to other agents over agent-to-agent (A2A) traffic, with each delegated scope provably attenuated at every hop in the chain.
- How the gateway itself behaves per caller - token budgets, prompt and content restrictions, and mid-session re-authorization for streaming or long-running agent sessions.
Delegated authorization for AI agents and A2A workflows
Agents rarely act in isolation. A user invokes a primary agent, which calls a tool, which delegates to a sub-agent, which calls another tool. That delegation typically travels over agent-to-agent (A2A) protocols - the wire format through which one agent asks another to perform a task on its behalf.
A2A is the transport. It specifies how agents discover each other and exchange tasks, but it does not decide whether the calling agent - and the human behind it - should be allowed to invoke the target agent's capabilities in the first place, or on what scope. That decision belongs in policy, at the gateway, in front of every A2A call.
Cerbos evaluates each A2A request the same way it evaluates model calls and tool invocations, with the delegation chain as first-class context:
- Delegation, not impersonation. Every AI request carries two identities: the agent that is acting, and the human or service on whose behalf it is acting. Policies evaluate both. Audit logs record both. The agent is a distinct principal, not a hidden stand-in for the user.
- User permissions are the upper bound. An agent cannot exceed the authority of the user who delegated to it, regardless of what tools it has access to or what scopes are attached to its token. Routing a request through an agent does not escalate privilege.
- Provable attenuation across A2A hops. When an agent delegates a task to another agent over A2A, the delegated scope must be a subset of the delegator's own authority and of the originating user's permissions. Cerbos verifies the subset relationship at each hop, so a sub-agent cannot re-expand its permissions or pick up capabilities the parent did not have.
- Explicit sub-delegation. A delegate cannot further delegate unless the original grant said it could. Cerbos denies onward A2A calls by default and allows them only when policy, or the incoming delegation, explicitly authorizes continued delegation.
- Bounded chain depth. Policies express the maximum delegation depth for a given action, so a chain of A2A calls cannot quietly grow into a multi-hop workflow the original grant never contemplated.
- Inter-agent routing constraints. Cerbos policies restrict which source agents may invoke which target agents, and which task types can be delegated - based on the user's identity, the agent's role, and the context of the request.
- Revocation propagates. When a delegation is revoked - the user logs out, the mission ends, the session is terminated - Cerbos denies subsequent decisions across every downstream agent on the next A2A call, without waiting for tokens to expire.
Delegation is one half of the picture. The other half is making sure those decisions hold all the way down the chain, not just at the first call.
Authorization enforcement at every hop, not just at the AI gateway
AI gateways are the natural first enforcement point for AI traffic. They are not the only one.
As tokens propagate through an agentic chain, decisions taken only at the first hop create confused-deputy and privilege-escalation failures downstream. The same Cerbos policies that govern traffic at the AI gateway also govern traffic at the MCP server, the downstream API gateway, and the application layer. Authorization is evaluated at every hop and the last mile - not only at the outer boundary.
Richer AI gateway decisions with context enrichment
Fine-grained policies are only as good as the context behind them.
AI gateways often forward requests with minimal identity context - an API key, a bare JWT claim, or a token mapped to a service account. Fine-grained policies need more than that: the delegating user's profile, the agent's own constraints, the resource's attributes, and the relationships between them.
Cerbos Synapse assembles that context on the PDP's behalf and connects the gateway to Cerbos through the integration layer it already supports.
- Context enrichment before evaluation. Synapse fetches identity, resource, and relationship context from your IdPs, databases, and APIs at request time, so the policy sees the full picture.
- Both the agent and the human in one call. For on-behalf-of AI traffic, Synapse passes both the agent and the delegating user's identity to the PDP, so policies can evaluate what the agent is allowed to do and whether the human behind it is authorized to trigger that action.
- No standing privilege. Every agent request is evaluated against the current policy with fresh context. Prior decisions are not reused as implicit grants.
- Integration layer for every AI gateway. Synapse connects enforcement points through the protocols they already use - native authorization filters, pre-request plugins, ext-auth calls, and SDK hooks - so there is no bespoke glue code per gateway vendor.
- Query planning for AI workflows. When an agent needs to discover what it can do on behalf of a user, Synapse returns the full set of allowed resources and tools with the same enrichment applied.
Cerbos works with every AI gateway
Cerbos is gateway-agnostic. The PDP integrates wherever the gateway exposes a hook for external authorization:
- Proxy-based gateways call Cerbos through a native external authorization filter over gRPC or HTTP.
- Plugin-based gateways call Cerbos from a pre-request plugin or policy flow using the Cerbos SDKs.
- Edge and serverless gateways call the Cerbos PDP over HTTP or gRPC from the edge runtime.
- Self-hosted and open-source gateways integrate through the same PDP API.
Synapse sits alongside the PDP to enrich each of these hooks with the context the policy needs. The authorization model is the same across every deployment: policies in Git, decisions in milliseconds, a full audit trail by default.
Closing the AI authorization gap
AI gateways have become the front door to LLMs, tools, and MCP servers. What they need next is a policy layer that decides what every caller, agent, and delegated request is allowed to do. Cerbos is that layer. It runs in front of every AI gateway, evaluates each request against fresh policy and full context, and produces one audit trail across humans, services, and agents. The result is fine-grained authorization that holds at the gateway, at every A2A hop, and all the way down to the application, without sitting in agent code or drifting out of sync with your identity stack.
If you're standing up an AI gateway, or you've already got one and want to bring it under proper governance, explore further here, or book a call to see how Cerbos plugs in and will work for your use case.
FAQ
Tagged in




