Stateful Apps in Kubernetes are a big deal


The last time I posted a blog was to explain why I had joined a cloud networking company. It turns out 8 months later, I would join another cloud-focused company, but at the other side of the spectrum. I’ve recently joined Ondat, which zeroes in on enabling Stateful Applications to run at scale on Kubernetes. Having spent the last 5 years working on cloud-native projects and initiatives, this feels like a natural move to me. I’ve also invested countless hours in developing content around DevOps, educating the DevNet community while at Cisco, studying for CKA, CKAD, and CKS (the 3 Kubernetes certs), and building products around the Container Network Interface (CNI). So it comes as no surprise that I’m getting back to a role that is certainly very dear to my heart, as a Developer Advocate.

You’re probably familiar with this role, which primarily focuses on interacting with the user community. It is prevalent at many companies developing products with a rich API, or where the API is the principal way to interact with the product. It’s a fancy name that entails wearing multiple hats. The way I see this role is at the intersection of evangelism, nurturing the community of users, product marketing, and establishing a closed-loop with product management. But what makes this particular combination very attractive to me, it’s the current momentum around Kubernetes adoption. I’m not talking about the adoption of Kubernetes by the early majority for running modern apps, this happened a long time ago, but rather the adoption by the “late majority” or even “laggards” (Geoffrey A. Moore, 2014,Crossing the Chasm,Harper Collins, New York).

These companies usually lean on “digital transformation” to migrate workloads to the cloud and modernize their applications. And Kubernetes is, of course, part of that picture since it has become THE cloud Operating System. Of course, traditional public Cloud Servicer Providers will try to hook you into their native services to build or refactor your applications and entice you to manage SDLC in their environment. But in the end, nothing beats kubectl apply when it comes to deploying a service. Any service, from Serverless to ML, kubectl rules! As long as the Pods backing these services are stateless, right?…or so I thought!

Some of you may argue that Cloud services are easier to consume and integrate through public APIs and eventually easier to scale up and down. Some may also assert the Kubernetes learning curve is too steep: by the time you build a team of platform engineers who can operate these DIY services, you could have saved time and money just building on top of existing public cloud services. True, but what happens if you need to rinse and repeat for another cloud? (and trust me, this will happen, and not for the reasons you think). You’d have to start from scratch again, with a different API, different services intricacies; and no, Terraform won’t save you for day 2 operations and instrumentation. Finally, another reason to build developer-focused services on top of Kubernetes is that in the long term, it will be cheaper, by far. Cloud costs progress unidirectionally, in case you didn’t notice!

But I digress…stateless; this is it.Kubernetes only runs stateless pieces of code. This misconception is probably coming from people who used Docker a lot. Other than named volumes or bind mount, there’s not much to help with stateful apps.

But what is actually a stateful app?

It’s not only about making sure session data persists to disk or data written by the application survives a container restart. It’s also about guaranteeing Pods identity is stable and there’s at most a single instance of that unit of compute running at any point in time. But the fact is that a lot of people are already running databases or other stateful applications in Kubernetes (check this report), and the shift to stateful continues with services such as Google Migrate for Anthos.

But there are additional challenges related to the application layer. For example, how do you bootstrap your stateful app cluster, making sure the primary node is effectively primary? Or how do you provide service discovery while at the same time ensuring the application scales up (or down) gracefully? Luckily there seems to be a proliferation of Kubernetes operators that help with the application layer. They substantially facilitate the lifecycle management of applications like relational or NoSQL databases or other types of stateful apps (check out as an example).

However, there’s a fundamental concern with the data service layer. Kubernetes doesn’t guarantee data availability. As a platform engineer, you have to manage this aspect outside of Kubernetes’ traditional primitives.

But Kubernetes does an excellent job standardizing interfaces: you can consume persistent storage services using the CSI (Container Storage Interface) driver. It is where you can plug Ondat in. Think about Ondat as a hyper-converged solution for Kubernetes available through the CSI. Ondat operates between two layers: sitting above your Kubernetes distribution of choice and connected through the CSI driver, and supporting the requirements of the Stateful Apps running on top. More importantly, Ondat is Kube-native. Its control plane relies on Kubernetes to act as the single source of truth, enabling a seamless experience for developers. Developers can control the features they need declaratively, taking advantage of increased resiliency, security, and scale using simple labels. Here’s an example of a StorageClass with encryption, replication, and compression enabled:

kind: StorageClass
app: storageos cluster self-evaluation storageos-operator storageos storageos
name: Ondat
parameters: csi-controller-expand-secret kube-system csi-controller-publish-secret kube-system ext4 csi-node-publish-secret kube-system csi-provisioner-secret kube-system "false" "true" "3"
reclaimPolicy: Delete
volumeBindingMode: Immediate

The important lines are lines 22, 23, and 24. With these 3 simple instructions, any persistentVolumeClaim asking for this StorageClass will consume a persistentVolume managed by Ondat, and deployed with these features configured. I’m just scratching the surface, there are many more advanced capabilities available such as Availability Zone Anti-Affinity (AZAA) or fast failover fencing mode. All are managed the same way, through simple labels. But what surprised me the most, especially after testing other solutions in the same space, is the ease of deployment. It’s literally a single command installing a Kubernetes operator. This operator is responsible for orchestrating the installation. Not much YAML is involved since most of the default settings don’t need to be changed.

One can easily imagine how this paradigm is a perfect fit for automation and GitOps pipelines. It enables a plethora of use cases. Here are a few examples:

  • Create Kustomize overlays to manage the development environment. Different environments (dev, QA, prod…) will have specific feature requirements in terms of security and availability.
  • Run policy-as-code with OPA Gatekeeper, enforcing syntactic rules with rego.
  • On-demand, self-service middleware provisioning. Developers can now deploy message bus, databases, pub/sub, service mesh, or any other stateful apps with a consistent and scalable distributed data service layer, regardless of the software lifecycle stage.

What are the alternatives?

There has been a lot of work in the cloud-native storage space. But I don’t consider Ondat as focused on storage, the focus is on the developer and platform engineer experience. I think a good analogy is that Ondat aims to be the “Snyk” for data. The key objective is to provide a cloud-agnostic solution for users to define their data service requirements seamlessly, whether developing locally on their laptops or deploying their code in production through CI/CD pipelines.

In terms of the alternatives, I think 3 distinct trends define this landscape:

  • Open Source solutions, such as OpenEBS, Rook, etc. From what I’ve seen and tested so far, they still have gaps and gotchas in terms of performance and architecture. Personally, and this is just my opinion, I wouldn’t consider that kind of solution at scale in production. Of course, this may change in the future!
  • Legacy storage vendors who want to get some hype. Via acquisition or innovation, they try to catch the cloud-native wave, but they face the same problem as Cisco in the networking industry. Their challenge is a sales force that doesn’t necessarily have the skills to understand the tech at stake or provide meaningful feedback to engineering organizations. It screams “lock-in” and “buy my storage boxes” to my ears. Not so cloud-native in the end.
  • Focused on limited use cases. Typically immature solutions, eventually with good potential to expand, but you can quickly spot the limitations.

Hopefully, I managed to shed some light on the Kubernetes stateful apps challenges and provide a good introduction to the cloud-native technology I’ve chosen to focus on. I’m very excited about it because it opens up many new use-cases for running powerful apps in Kubernetes! Just another excuse to keep learning really!

written by:
Nic Vermandé
Nic is an experienced hands-on technologist, evangelist and product owner who has been working in the fields of Cloud-Native technologies, Open Source Software, Virtualization and Datacenter networking for the past 17 years. Passionate about enabling users and building cool tech solving real-life problems, you’ll often see him speaking at global tech conferences and online events, spreading the word and walking the walk with customers.

Sign up for the Tech Preview

Learn how Ondat can help you scale persistent workloads on Kubernetes.