Patterns of failure in modern authorization

Published by Daniel Maher on July 01, 2025

This blog post was adapted from talks presented by Dan Maher, Senior DevRel Manager at Cerbos, at Snowfroc (OWASP; Denver, USA; March 2025) and AuthCon (API Days; NYC, USA; May 2025).

Let's kick things off with a bold claim: most authorization systems in production today are fundamentally broken. That's because authorization is hard—and not just for you—for really big, well-known companies, too. So today we're going to talk about why authorization is the way it is today, look at some magnificent failures, and go through some patterns to help you avoid becoming a future case study.

What the heck is authorization anyway?

Great question—thanks for asking.

Authorization is fundamentally about answering a three-part question: Is this entity (or "principal") allowed to perform this action on this resource? It's related to, but separate from, authentication. Authentication is fundamentally about identities—basically, who are you? Authorization is about permission—as in, now that your identity is established, what can you do?

The POSIX permission model from Unix fundamentally shaped how we think about authorization today. The user/group/world paradigm combined with simple read/write/execute primitives became the standard approach across operating systems to this day. What's important is that it established the core concept of separating entities from actions and resources. This is literally the foundation of RBAC, which we'll talk about more in a bit.

Ultimately POSIX has limitations, and the concept of Access Control Lists (ACLs) emerged in the 70s and 80s, offering more granular control, at the cost of additional administrative and implementation-level burdens. This led to RBAC in the 90s, notably formalised by the RBAC96 model, which represented a more sophisticated approach—one that could better model the way businesses are actually structured. This abstraction became the dominant enterprise model through the 90s and 2000s, and still underpins many modern systems today.

Fast-forward to today. Modern authorization has had to evolve dramatically to handle cloud, microservices, and distributed systems. Token-based approaches like JWT have become the standard for transferring authorization data between systems, while OIDC and OAuth have provided frameworks for federated authorization. We've evolved beyond simple perimeter-based security to a layered approach where identity—and context—drive authorization decisions. So, rather than assuming trust once someone's inside the network, modern systems verify and authorize continuously across multiple trust boundaries. These challenges apply to organizations of all sizes; the architectural principles remain the same whether you're handling billions of daily decisions or just thousands.

So that's where we're at with authorization today. It's big and kind of scary and easy to get wrong—and in that spirit, let's look at some examples of where things went wrong.

Cautionary tales

Facebook (2018): When UI changes break privacy controls

In May of 2018, Facebook accidentally set 14 million users' posts to public visibility during what should have been a limited test of a new feature. The core issue? A UI change that unexpectedly modified their policy enforcement layer (because they were the same thing). By the time they discovered and fixed the bug—10 days later—millions of supposedly private posts had been publicly accessible, triggering significant privacy concerns and regulatory scrutiny.

This incident revealed three critical authorization design flaws. First, a UI update inadvertently modified core policy enforcement logic—showing how authorization decisions should really be isolated from interface-level code. Second, the system lacked proper permission state validation before making visibility changes. Third, there was no safeguard mechanism to detect unexpected permission changes. All of this ultimately highlights why authorization logic should really be decoupled from application code.

Okta (2023): Social engineering and session tokens

In October of 2023, Okta disclosed a breach where threat actors used good old-fashioned social engineering to convince a third-party support company—contracted by Okta—to send them HAR (HTTP Archive) files containing active session tokens. This allowed the attackers to effectively bypass authentication checks and gain support-level access to customer environments. Now, I'm not dunking on anybody here, but what makes this incident particularly noteworthy is that it happened to a company whose core business is security and identity management. In other words: if it can happen to them, it can happen to anybody.

There's a lot going on in this incident, but I want to focus on three things. First, their support system—which, again, was a third-party entity—had excessively permissive access to customer environments. This violated the principle of least privilege, whereby users and systems should be granted only the minimum permissions necessary to perform their required functions. Second, there was insufficient isolation between support tiers, which allowed broad lateral movement. Third, their infrastructure relied too heavily on session tokens without additional validation, monitoring, or rotation. This incident shows why systems need multiple authorization boundaries and tiered access controls, especially for privileged systems.

Microsoft (2024): Legacy infrastructure as an attack vector

In January 2024, Microsoft disclosed that a threat actor named 'Midnight Blizzard' (also known as APT29) had compromised their environment. Now pay attention here because this next sentence gets worse with every word. The breach began with a password spray attack; against a legacy; non-production; test account; that hadn't been configured with multi-factor authentication. Again, not dunking here, but what makes this incident particularly notable is that it involved Microsoft's own cloud environment—highlighting that even the companies building our modern Internet aren't immune to authorization problems.

The Midnight Blizzard attack revealed critical authorization weaknesses in Microsoft's environment: First, excessive privileges in legacy tenant configurations gave accounts more access than necessary. Second, inadequate tenant isolation allowed attackers to pivot from test to production environments. Third, the lack of just-in-time access controls meant that permanent privileges, once compromised, could be exploited indefinitely. This incident demonstrates the importance of reviewing authorization models continuously, especially for legacy systems, and implementing strong tenant isolation with dynamic privilege management.

Give yourself a fighting chance

Alright, let's talk about how not to be my next case study—or, at least, give yourself a fighting chance.

Token security

Let's talk token security—the cornerstone of modern distributed authorization. To properly secure tokens, you'll want to focus on these three practices: First, always validate cryptographic signatures to prevent tampering. Second, enforce strict expiry times to limit the blast radius of stolen tokens. Third, verify that tokens came from trusted issuers. For storage, avoid client-side options like localStorage (there are better approaches like SessionStorage or IndexedDB, etc).

For reference, the most common vulnerabilities we see are weak signing keys, skipped validation checks, and overly permissive CORS policies. These aren't theoretical risks—recall how token theft was central to the Okta breach we just looked at, for example.

Permission management

Next up is permission management, which is actually one of the biggest challenges in modern authorization. The thing you really have to watch out for here is "role explosion", which is where you end up with hundreds of poorly defined roles and permissions that nobody fully understands. Specific strategies will vary, but the contours of the solution might involve role hierarchies and templates. But honestly, the real path forward is attributed-based access control. (There's a whole separate blog post just about ABAC so please do bookmark that for later.)

Ultimately, you need to implement real-time or just-in-time access wherever possible—grant elevated privileges only when needed and revoke them automatically afterward. Finally, make the principle of least privilege a reality—the goal is a system where permissions are dynamic and contextual, not static and permanent. This speaks directly to the over-permissioning issue we saw in the Microsoft case.

Externalize your authz

Next, seriously, get that authorization logic out of your application code. When authorization logic is embedded in your codebase, it becomes nearly impossible to maintain consistently—especially as systems grow in complexity. By externalizing authorization, you create a single source of truth for policy decisions across your entire system. This separation delivers a tonne of benefits, including unified enforcement regardless of which service is making the request; the ability to update policies without code deployments; and clearer audit trails for compliance and security reviews (that last one is going to make your CISO happy).

Again, this is not theoretical: this approach directly addresses the authorization consistency issues we saw in the Facebook case, where UI changes unexpectedly modified access control behavior. There are a number of options here. OPA, for example, is a CNCF project that uses the arcane Rego language to define rules. There's also Cerbos—which, again, that's what I work on—which is open source and you get to write YAML policies. Anyway, the goal is the same: move authorization decisions out of your app code.

Test-driven everything

Let's talk about testing authorization (an area where hope is not a strategy). First, implement policy unit tests that verify authorization decisions in various scenarios, including edge cases. Make these tests comprehensive—they need to cover both the allows and the denies. Second, integrate authorization testing into your CI/CD pipeline, just like your code deployments, so that policy changes are vetted before they go to prod. Third, just like in config management, you need tests and tooling to detect drift—the gap between intended and actual permissions that emerges over time. This continuous validation is clutch because modern authorization is dynamic, and even with externalized solutions, unexpected behaviors can emerge in complex systems.

Highly-available authorization

Authorization systems present a unique challenge: they're on the critical path for almost every operation, yet must remain highly available while enforcing strict security policies. Achieving high-availability authorization requires careful architecture, so like local caching of authorization decisions, fallback policies for when central services are unavailable, and redundancy across multiple failure domains. Equally important are graceful degradation strategies that maintain security even when components fail. This might include conservative default deny policies, tiered fallback rules based on risk (you see this a lot in Fintech for example), and circuit breakers that prevent cascading failures. The goal is to ensure that authorization never becomes a single point of failure while still providing strong security guarantees.

The way forward

We've covered a lot of ground in this article, from the early days of POSIX permissions through to modern distributed authorization systems. Through it all, we've seen that authorization is a challenge, and it's actually getting more complicated over time—not less.

Luckily, the path forward is simple to explain (if not to put into practice): externalize your authorization decisions to create consistency; implement rigorous token validation to prevent bypasses; use just-in-time access and least privilege to reduce your attack surface; and test continuously with regular access reviews. These principles address the exact failures we saw in our case studies, from Facebook's UI-authorization coupling to Okta's token issues to Microsoft's over-privileged accounts.

I'll conclude with this last bit of advice: accept that you can’t eliminate complexity—but you can contain it. Authorization, done right, will help your team build faster, safer, and with fewer "oh no" moments in prod.

If you want to dive deeper into implementing and managing authorization, join one of our engineering demos or check out our in-depth documentation.

Book a free Policy Workshop to discuss your requirements and get your first policy written by the Cerbos team