Securing cloud architectures in the age of non‑human identities and ephemeral services

Published by Alex Olivier on August 13, 2025

Non-human identities (NHIs)- like service accounts, API tokens, and CI/CD jobs - are exploding across cloud-native architectures. With that comes a fresh set of challenges around authorization, especially in systems built around ephemeral workloads and zero trust principles.

In a recent episode of the Software Plaza podcast, Cerbos co-founder and CPO Alex Olivier sat down with host Twain Taylor to unpack this shift. They explored the rising complexity of managing NHIs, why runtime contextual access control matters more than ever, and Cerbos’ perspective on tackling these problems in a scalable, stateless, and standards-based way.

You don’t need to have listened to the podcast to follow along - but if you’re curious, you can check out the recording above. Let’s dive in.

The rise of non-human identities, and why it matters

It wasn’t long ago that “identity management” mainly meant handling human users: employees, customers, admins logging in with usernames and passwords. Today, workload and service identities have taken center stage. Every microservice, container, CI pipeline, serverless function, and IoT device likely needs its own credentials to authenticate and communicate. Kubernetes has had service accounts from day one, and developers have been issuing API keys to services or client credentials to CI jobs for ages - those are all examples of non-human identities.

What’s changed is the sheer scale and dynamism of these identities across a cloud-native architecture.

Think of a modern SaaS application: it might consist of dozens of microservices, each with its own identity; batch jobs and serverless functions spun up on demand; third-party APIs integrations; not to mention emerging AI “agents” acting on a user’s behalf. We’re now dealing with workloads that spin up very ephemerally, perhaps for just a few seconds or minutes, and act on behalf of users or trigger other processes. Each needs an identity while it runs.

This leads to an explosion of service credentials across the environment. In fact, industry reports, from Gartner and others, peg the ratio of machine-to-human identities anywhere from 45:1 to 80:1. Every IoT sensor, every piece of software calling an API, even your pet’s microchip or your car in a connected garage - anything that communicates in a network now requires an identity.

This rise of NHIs matters because each identity is a potential access point. If not properly managed and secured, every API key or service account is a new vector an attacker could exploit. It’s no exaggeration to say NHIs are now one of the fastest-growing attack surfaces in cloud infrastructure. A compromised machine identity can be as damaging as a compromised user account - sometimes even more so, since these accounts often have high privileges or lack oversight. Security teams and architects must account for this growth by implementing controls that cover not just users, but all the non-human actors in their systems.

The lifecycle nightmare of workload credentials

Managing human identities (employees, etc.) usually involves a well-defined lifecycle: onboarding / provisioning an account, managing permissions, and offboarding / revoking access when the person leaves or changes roles.

Non-human identities should follow a similar “joiner-mover-leaver” process, but in practice they often don’t. The challenge is that this lifecycle isn’t being managed for most service identities. Developers might manually create a cloud API key or service account for a new microservice or CI job, stick the credentials in a config or CI secret store, and then forget about it. These credentials tend to be long-lived and persistent, often outliving the service or task they were created for.

For example, imagine a CI/CD pipeline where you create an AWS access key for a GitHub Actions job to upload build artifacts to S3. You store that key as a secret in your CI pipeline. Now, if someone later deletes the pipeline or the job is no longer needed, will the AWS key be deleted too? Often not - the secret lives on in AWS indefinitely. Over time, you accumulate “orphaned” credentials that still have valid access to your infrastructure. These machine credentials can stick around longer than the engineer who created them, or even longer than the application itself, unless you have strict governance to clean them up. It’s a nightmare for security because every leftover credential is essentially a standing privilege that an attacker could discover and misuse.

This situation flat-out breaks the principles of least privilege and zero standing access that underpin modern zero trust models. In a zero trust approach, you want to avoid any permanent, broad access entitlements. Instead, credentials should be short-lived and tightly scoped to what’s needed in the moment. But the reality today is that many organizations have thousands of static secrets and accounts scattered across cloud consoles and config files, with no central visibility. It’s easy to see how over-privileged NHIs become an invisible risk.

What are the anti-patterns to avoid?

NHI anti-patterns to avoid.png

First, managing service accounts or keys by hand - clicking around a cloud console to create keys, copying them into code or CI - does not scale and is prone to human error. If you find yourself manually creating and pasting credentials, that’s a sign something is wrong - it opens the door to secrets sprawl and forgotten keys. Automation is your friend here. Infrastructure-as-code tools (Terraform, Pulumi, etc.) or identity orchestration can ensure that whenever a workload identity is created, it’s tracked and can be torn down when no longer needed. Likewise, central identity governance or privileged access management systems can help manage non-human accounts across various SaaS platforms so you’re not hunting through each system’s user interface to revoke a token.

To learn more about how to secure every workload, microservice, AI agent, and API client in your architecture with policy-driven authorization - click here.

Another dangerous anti-pattern is blindly trusting any request with valid credentials. In other words, treating authentication of a service as a free pass for it to do anything. This might have worked (barely) in a simpler era of monoliths and static networks, but not in today’s dynamic, polyglot environments. If a service presents a valid token or API key, that should authenticate who it is - but it says nothing about what it should be allowed to do. For robust security, authentication must be followed by contextual authorization checks every time. We’ll discuss this more in the next section, but the key idea is: don’t grant blanket, long-term privileges to a non-human identity and assume it will only be used as intended. Instead, limit its scope and lifetime, and continuously verify its actions.

Finally, consider the ephemeral nature of modern cloud workloads. With serverless functions, short-lived containers, and on-demand jobs, services might exist only for a few seconds or minutes. The idea of giving such an ephemeral service a permanent credential is fundamentally at odds with a cloud-native mindset. If a job lives for 30 seconds, ideally its credentials should expire in 30 seconds too. Managing that manually is impossible - it calls for automated token issuance and revocation. This is where concepts like time-bound credentials and even per-transaction identities come in (more on that shortly). The bottom line is that our tooling and processes must catch up to this new reality: credentials need to be as ephemeral as the workloads using them.

Ephemeral services, short-lived tokens, and contextual authorization

Because services now appear and disappear rapidly, often on behalf of users, runtime context becomes critical in authorization decisions. It’s no longer enough to check “has this service presented a valid credential?” and then let it through by default. You also need to ask “Should this particular service (or function, or container) be doing this, on behalf of this user, at this time, from this environment?” Answering those questions requires a richer set of attributes and a policy engine that can evaluate them in real time.

In the past, authorization was often handled at what Alex calls “admin time” - for example, when you created the service account, you might assign it a role that statically determines what it can access. The problem is, admin-time permissions are typically broad and long-lived. The next layer is “authentication time” - when the service authenticates - say, an OAuth2 client credentials flow to get a token, the identity provider could embed some scopes or claims in the token to restrict it. This is better, but still coarse. A token might last for hours and allow the service to call multiple APIs during that time. It isn’t continuously re-evaluated against changing conditions. As Alex notes, just giving a service a token with blanket read/write access for its lifetime is not granular or reactive enough to guarantee security.

The real game-changer is “runtime authorization” – performing policy checks at the moment of each request, using all available context. Instead of implicitly trusting a service because it authenticated successfully, every action it attempts can be evaluated against current conditions. This is how you enforce principles like least privilege and zero trust on the ground: “never trust, always verify” each operation. By doing so, you move away from trusting the network or a static identity and move toward trusting only the specific call being made, in the specific context it’s made in.

What kind of context are we talking about? It can include: who the end-user is (if the service is acting on a user’s behalf), what the service or workload is (its own identity, e.g. a microservice name or SPIFFE ID), where it’s running (cluster, namespace, environment, IP range), when (time of day, within a session window, etc.), and potentially even a risk score or anomaly detection signal. With a rich context, you can write policies like: “Service X can only perform action Y on resource Z if it’s running in production namespace, the request comes from a user with role admin, during business hours, and the user’s session is not flagged high-risk.” This may sound complex, but with the right authorization tooling it becomes quite manageable as a policy - and it dramatically reduces the chance of unauthorized activity.

A concrete example helps illustrate the value of this approach. Imagine a user, Alice, has a valid authentication token (JWT) for your API. Now suppose a malicious actor somehow steals Alice’s token, by sniffing traffic or some other breach. In a traditional setup, that attacker could take the token and call various services, and as long as the token hasn’t expired, the services would treat the requests as coming from Alice. Contextual authorization can shut that down. If your policies know that, for instance, Alice’s token is being used from the document-service in us-west cluster as normal, then a request using Alice’s token coming from, say, the analytics-service in europe should be denied.

In fact, this is exactly the kind of fine-grained check that prevents token theft from turning into system-wide compromise. As Alex described, if a token is stolen and used by the wrong service, the authorization layer can catch it and block the request - because the call context doesn’t match what’s expected. In essence, a valid identity presented in the wrong context will be rejected. This thwarts lateral movement and misuse of credentials in a way that simple identity verification (authN) cannot.

visuals for blogs - Role of contextual authorization in securing NHIs.jpg

The figure above highlights a key design: authorization happens at multiple layers - at the gateway, between services, and within services - but it doesn’t have to mean writing disparate ad-hoc rules in each place. A Policy Decision Point (PDP) can be deployed wherever decisions are needed (in API gateway, in the mesh sidecar, or alongside the service) while all policies are managed centrally. This way, every request carries user identity and service identity info through the stack, and is checked against the relevant policies at each hop. If anything looks out-of-bounds (wrong service, wrong scope, abnormal timing, etc.), that request is stopped in its tracks. Such layered, contextual authorization is essentially the practical realization of zero trust. Instead of relying on a secure perimeter or assuming a service should be trusted after login, we treat every operation as untrusted until policy says it’s OK – continuously, in real time.

Crucially, modern tooling makes this feasible with minimal latency. JSON web tokens (JWTs) carry info that can be inspected quickly. In-memory policy engines can evaluate dozens of conditions in milliseconds or less. The overhead of checking permissions at each step is far outweighed by the security gained - and users won’t notice a difference if it’s implemented efficiently. In fact, as Alex pointed out, we’ve reached a point where zero trust isn’t just a whitepaper concept but an attainable architecture with today’s tech stack. Projects and standards like SPIFFE for workload identity, service mesh frameworks, and policy engines like Cerbos are all converging to enable this new normal of “verify everything, trust nothing by default.”

How Cerbos enables contextual, zero trust authorization

Cerbos was designed from the ground up to address these exact challenges. In the podcast, Alex described how some customers were already using Cerbos in creative ways to handle non-human identities and multi-layer authorization, even before Cerbos explicitly marketed that use case. That was possible because of a few fundamental design choices in Cerbos’ architecture:

1. Stateless, flexible PDP architecture

Cerbos’s policy engine (PDP) does not store any user or session state internally - it is completely stateless with respect to identities and context. It holds only the policies (the rules for who can do what), which you write and version control externally. All the information about which user, which service, which resource, what context is passed in on each request to the Cerbos PDP, which then evaluates the policies against that input and returns allow/deny, plus an audit log. This is sometimes called a “contextual authorization” or policy-as-code model. It contrasts with stateful authorization systems that maintain a central database of roles/permissions or an ACL graph in memory. Those require syncing identity data into the auth service, which can be slow and complex - especially when identities are ephemeral.

Cerbos’s stateless model pays off big time in scalability and agility. Because the PDP is just crunching data in memory, it’s extremely fast and easy to scale horizontally. More importantly, it’s ideal for ephemeral scenarios. If you spin up a job-specific identity or a one-time token, you don’t have to pre-load it into Cerbos at all. As soon as that identity makes a request, Cerbos will see the token or ID in the request and can apply policy to it on the fly. There’s no replication or cache propagation delay.

Alex gave the example that if you had a stateful permission system, issuing a short-lived identity for a one-off transaction would be impractical - you’d have to insert that identity into a database, distribute that state to all the servers, use it, then clean it up immediately after. By the time all that happens, the ephemeral workload is likely finished! In Cerbos, none of that overhead exists: the moment a credential is issued, it can be used in a policy decision without any setup, and once it’s expired, it simply won’t be accepted (and there’s nothing lingering to revoke). This stateless approach aligns perfectly with ephemeral compute and dynamic workloads.

Another benefit of statelessness is that you can deploy the PDP anywhere - as a sidecar next to your service, as a library embedded in-process, or as a centralized service - without worrying about session stickiness or heavy data stores. Some Cerbos users run a PDP instance alongside each microservice for ultra-low latency local calls, while others might have a few PDP pods per cluster. Either way, scaling is simple: if your app scales up, you just start more PDP instances, or they auto-scale, and they all load the same policies and operate independently. There’s no centralized bottleneck to worry about. This distributed decision-making with centrally managed policy gives you the best of both worlds: local, fast enforcement and global consistency.

2. Embracing open standards (no lock-in)

In a complex ecosystem with many vendors and tools - identity providers, gateways, meshes, etc. - it’s crucial that your authorization layer plays nicely with standard protocols. Cerbos follows this philosophy closely. It doesn’t implement proprietary identity systems; instead, it accepts whatever identity your environment uses - JWTs, OAuth2 access tokens, mTLS client certificates, SPIFFE IDs, etc. In fact, Cerbos has native support for SPIFFE, which is an open standard for issuing identities to workloads in a uniform way. If you’re using SPIRE or another SPIFFE implementation to assign IDs to your services, Cerbos can parse those and let you directly write policies about, say, “allow calls from workloads in trust domain prod.cluster.local” or “deny if the service account is ci-job outside of the CI namespace.” These checks become one-liners using Cerbos’s built-in SPIFFE-aware policy primitives, rather than hacky string matching or custom code.

Cerbos is also aligned with emerging standards for authorization. Cerbos is a core member of OpenID Foundation’s AuthZEN Working Group, which aims to standardize how applications communicate with external PDPs. By adhering to standards, and helping shape them, Cerbos ensures you can integrate it into your stack with minimal friction. For instance, authentication is typically done via OAuth2/OIDC and yields a JWT - Cerbos readily consumes that token format. Workload identity might come from Kubernetes certificates or SPIFFE - Cerbos can ingest those. This means you don’t have to change your identity provider or issue new kinds of tokens just to use Cerbos. Adhering to open standards also future-proofs your architecture: if you change identity providers or add new components, the authorization layer can remain the same, evaluating the standard claims and attributes that everyone agrees on. In short, build on standards so that your tools interoperate cleanly – that’s a key piece of advice for anyone designing a secure system.

3. Granular policies that include time and transaction context

One of the most interesting implications of the NHI trend is the idea of temporary, delegated authority - essentially, giving a workload just enough privilege to do a specific task and no more, often for a limited time. We touched on this with ephemeral tokens. Cerbos policies are powerful enough to express conditions like time windows - e.g. only allow access if the current time is within the job’s schedule. In fact, Cerbos has supported time-based conditions for years, which is useful for things like expiring a permission at midnight or allowing an action only during business hours.

Going further, consider the concept of a transaction token: a credential that is minted to cover a single workflow or chain of actions, and that expires as soon as that workflow completes. This concept is gaining traction (projects like Tokenetes are exploring it) as a way to implement just-in-time, context-specific privileges.

Imagine an AI agent that, when triggered by a user prompt, needs to fetch some data, call an API, and write a report. Instead of letting that agent use the user’s full authority or a broad service account, you could mint a one-time token that only permits the exact sequence of actions needed - and is only valid for, say, 30 seconds while the agent does its work. Cerbos can take such a token and include it in its policy evaluation. For example, a policy could say: “If a transaction token is present, ensure it’s not expired and that it’s authorized for the operations being attempted; if it’s absent or invalid, deny.” By checking the age of the token and the scope of the transaction within the policy, you effectively put a leash on any autonomous workflow. Once the token’s time is up or it tries to do something outside its scope, Cerbos will start denying actions, containing the “blast radius” of that workflow. This is hugely important for AI-driven processes or complex multi-service operations, where a single user action can fan out into dozens of automated calls. With per-transaction identities and strong policy enforcement, you can ring-fence each automated workflow so that nothing runs amok or persists longer than it should.

The exciting thing is that Cerbos enabled these kinds of patterns without requiring any changes to the core product - it was built in from the start as a general-purpose policy engine. When customers started using Cerbos to handle service-to-service authorization with SPIFFE IDs and transient tokens, it validated this flexible design. Alex highlighted that they didn’t need to add special “NHI features” to make it work; the engine already treated any principal, human or machine, the same way, evaluating whatever attributes you provide.

Simply put - by externalizing authorization logic to a purpose-built service like Cerbos, you don’t have to refactor your whole application to adopt these best practices. You can introduce runtime authorization checks gradually - for instance, start at the API gateway or for a particularly sensitive internal API - using Cerbos alongside your existing code. Over time, you can expand to cover more services and more context, all without baking complex permission logic into each application. Your apps remain focused on business logic, while Cerbos handles the policy decisions. And because Cerbos is stateless and embeddable, it won’t add significant overhead or operational complexity. As one user put it, running the PDP next to the app with as many instances as needed means it won’t become a single point of failure or performance drag.

Conclusion

The rise of non-human identities and ephemeral workloads is transforming how we approach application security. Identity is now the new perimeter - but identity alone isn’t a silver bullet unless it’s coupled with fine-grained authorization. By managing the full lifecycle of service credentials, embracing short-lived tokens, and enforcing context-aware policies at every stage, organizations can significantly reduce the risk posed by over-privileged machines and automated processes. The concepts we’ve discussed - least privilege, zero standing permissions, continuous verification - all align with the broader push toward zero trust architectures in the cloud. The good news is that these aren’t just theoretical ideals; with modern tools like Cerbos, SPIFFE, and others, they are practical and achievable today.

Cerbos’s approach of stateless, externalized authorization offers a robust way to implement these controls without having to rebuild your entire platform. You can drop a policy engine into your stack and start authorizing both user and service actions consistently according to policy. It lets you adapt quickly - whether that means plugging into new identity standards, scaling up to millions of tokens, or adding new rules as threats evolve - because policies are just code, and the engine takes care of the heavy lifting at runtime.

As cloud infrastructure continues to grow more complex, taking a proactive stance on machine identity and authorization is no longer optional - it’s becoming essential for security and compliance. The conversation between Alex Olivier and Twain Taylor really highlighted that forward-looking companies are already moving on this, baking in workload IAM and granular authz into their architectures. It’s not only about preventing breaches, but also about enabling innovation confidently (e.g. adopting AI agents or multi-cloud microservices) knowing that strong guardrails are in place.

If you’re ready to explore this further, a great next step is to check out the Cerbos guide to securing NHI’s. Feel free to schedule an engineering demo, or join the growing community in the Cerbos Slack channel - It’s an excellent place to ask questions, learn from real-world use cases, and get help as you implement authorization in your own projects.

Book a free Policy Workshop to discuss your requirements and get your first policy written by the Cerbos team

AI agents, the Model Context Protocol, and the future of authorization guardrails

Alex Olivier on September 27, 2025

Guide

Presentation

Techstrong TV: Decoupling in software development, Emre Baran | KubeCon + CloudNativeCon Europe 2023

Emre Baran on September 17, 2025

Presentation

Flagsmith podcast: The Cerbos story, from vision to reality

Emre Baran on September 17, 2025

Presentation

Securing cloud architectures in the age of non‑human identities and ephemeral services

The rise of non-human identities, and why it matters

The lifecycle nightmare of workload credentials

What are the anti-patterns to avoid?

Ephemeral services, short-lived tokens, and contextual authorization

How Cerbos enables contextual, zero trust authorization

1. Stateless, flexible PDP architecture

2. Embracing open standards (no lock-in)

3. Granular policies that include time and transaction context

Conclusion

Related Articles

AI agents, the Model Context Protocol, and the future of authorization guardrails

GitOps for application authorization

The Stack Overflow podcast: Going stateless with authorization-as-a-service

The Business of Open Source podcast: Simplifying authorization with Emre Baran

Techstrong TV: Decoupling in software development, Emre Baran | KubeCon + CloudNativeCon Europe 2023

Flagsmith podcast: The Cerbos story, from vision to reality

Subscribe to our newsletter