Stateful Applications In Kubernetes

Table Of Contents
About This Paper
Introduction
State in Kubernetes - A Lightning History
Disastrous Recovery - The Persistent Risk
DBaaS - The Persistent Costs
  • Hidden Storage Lock-in
  • DBaaS Simplicity With You In Control
Shifting Data Storage and Resilience Left
  • Pipeline-Wide Experience — The New Land Grab
What does Freedom Bring for Stateful Kubernetes Storage?
  • Cloud Storage Options
  • Cost Focus
  • Resilience Focus
  • Performance Focus
  • The In-House Storage Option
Summary

About This Paper

Recent surveys suggest rapid growth in the number of stateful applications being run on Kubernetes. However, the challenges of running production-ready transactional systems are one of the top concerns for Kubernetes users.

Kubernetes and container-based application development can play a core role in facilitating any switch to DevOps, GitOps, and more efficient CI/CD pipelines. But while many organizations are happy to develop and run stateless workloads within this environment, there has been a reticence to transfer and build more business-critical, stateful applications on Kubernetes. Even where early adopters have leaped to run transactional systems on Kubernetes, fear of storing data in a system built to run ephemeral containers has led to an over-reliance on costly database and cloud storage services.

This paper explores the challenges and realities of running business-critical stateful applications on Kubernetes — the hidden risks, pitfalls, and costs; alongside the latest solutions and exciting new opportunities. It is now possible to cost-effectively run any stateful applications on Kubernetes safely, securely, at scale, and with high performance. This paper explains how.

Introduction

There is no escaping the tech-hipster status of Kube. The industry is currently experiencing a wave of Kubechasing or Kube-envy where project owners, digital transformation leaders, and developers all want to use the same infrastructure as Google, Netflix or Amazon.

The perfect platform for DevOps and GitOps, Kubernetes makes for an incredible playground and there’s no denying that. The speed at which teams can build working prototypes from a lego brick construction of ready-bundled applications, frameworks, and open source components is both attractive and exciting. And with the ability to move containers seamlessly from the developer’s laptop, through testing, and into production, this innovative play can very quickly become revenue-generating production systems.

 

State in Kubernetes - A Lightning History

Essentially a large, common API for the management and orchestration of container-based applications, Kubernetes is now an established cornerstone of the modern technology landscape. However, when Kubernetes first emerged from within Google in 2014, it was a much simpler beast. Many of the features that enterprise users now rely upon were absent from the Kubernetes “Primitives” - the core, concepts, or building blocks of Kube that are defined within the API.

Fundamental infrastructural concepts such as the Operator framework and Service Mesh, were added to Kubernetes as the open-source project developed, with varying levels of performance and dependability. Perhaps chief amongst these retro fittings, are the concepts of state, stateful applications, and persistent storage.

Kubernetes was initially built to handle stateless applications — ephemeral processes and functions that could be spun-up, started, moved, and stopped on demand, without the need to transfer the ongoing ‘state’ of the application and its user activity. As late as 2015, the pioneers of container-based development still envisioned a dualistic world of applications — with containers running stateless services, alongside persistent apps and databases running in virtual machines.

The first reliable form of State appeared in Kubernetes with the general availability of statefulSets (petSets) in 2016. And now, delivering stateful applications on Kubernetes is easy at first glance — just download a Kubernetes Operator for your database of choice, for a key-value store such as Redis or streaming applications like Kafka, or just use a few lines of YAML to create persistent volumes from your chosen platform’s default block store. But as more companies look to run business-critical systems in Kubernetes, the reality is a little more nuanced and complex.

Read our blog on the Growth of State in Kubernetes

Crippling The Scheduler Through Poor Storage Choices

The Kubernetes Scheduler plays a vital role in the inherent efficiency and resilience of the system. Initially built to handle stateless ephemeral processes, Kube responds to node failures by dynamically rescheduling processes and containers to an available node.

Once state is introduced to applications in Kube, use of the wrong underlying storage platform can effectively cripple the scheduler within this role. DBaaS services and Persistent Volumes using the default CSI driver to access cloud storage, will be tied to a single node. In the event of node failure, remounting the relevant storage volume can result in application outages of 15 minutes or longer.

Crippling The Scheduler Through Poor Storage Choices

For many business critical applications, architecture must be distributed across multiple datacenter availability zones (AZs).

DBaaS and default storage options are typically restricted to a single AZ. Historically, this has severely limited the type of stateful applications that could be run on Kube, with best practice and even regulatory compliance often mandating a more resilient architecture.

Disastrous Recovery - The Persistent Risk

For most organizations, production systems cannot afford a ten-plus minute recovery time after a simple network interruption or a server failure. Data for production applications needs to be mirrored across multiple data center availability zones (AZs). Customer information and other sensitive data must be encrypted in transit and stored securely. Erratic application response times and poor performance are not desirable.

And yet, if you build stateful Kube applications on your platform’s default, networked block storage, these are precisely the type of issues you encounter. And in most organizations, if you give your developers an exciting new playground like Kube, they will use the default option to create stateful storage.

All questions of data resilience, application performance, security, and, in many cases, essential regulatory compliance will be placed as an afterthought, squarely on the shoulders of the Platform Engineer. It is, after all, their responsibility. But it doesn’t have to be their burden. With an effective shift-left strategy, the entire problem can be removed and organizations gain a swathe of additional cost benefits and freedoms.

 

DBaaS - The Persistent Costs

Before we explore shifting data storage and resilience left, we have to discuss the $24bn elephants in our persistent data room.

There is, after all, a simple solution to the issue of data persistence in the cloud — Database-as- a-Service (DBaaS): relational database services, most commonly known to AWS users as RDS, or NoSQL services such as DynamoDB.

For organizations running Kube in the cloud (EKS/GKE/ AKS), DBaaS services are easily available. For developers, a new DBaaS instance is typically two or three clicks away. For Platform Engineers, DBaaS also takes care of many of their concerns and responsibilities. The worries only really start when someone looks at the monthly invoice from the cloud provider. DBaaS services are typically around three times the cost of your standard platform block storage.

Ondat’s ultra-resilient, software data control plane allows the most demanding and business-critical stateful applications to be run safely and securely on any Kubernetes platform. Persistent data is replicated across all nodes within a Kube cluster. In the event of a node failure, the scheduler can reassign stateful workloads to a new node and persistent data volumes are instantly available.

In addition, Ondat allows persistent data to be replicated across datacenter availability zones, providing an additional layer of resilience and ensuring that business-critical stateful applications can be run on Kubernetes in line with industry best practices.

screen 11 screen 2
Hidden Storage Lock-in

But there is more to consider. If Kube applications are built on a proprietary DBaaS solution from a single cloud provider, they are generally going to stay there. Your application is pretty much locked-in. The API may look a lot like standard Postgres, MySQL, MongoDB, or other open source solutions, but it is not quite. Migrating an application away from RDS or other similar DBaaS services can be a significant headache.

Moreover, most DBaaS services still fail to deliver a lot of the resilience and security features required by businesscritical systems.

DBaaS Simplicity With You In Control

However, organizations can provide their DBaaS offerings, delivering their own choice of storage, directly to any node in Kube. Platform engineers and operations teams can ensure even greater levels of performance, resilience, and security, and the whole thing can be automated and delivered as a developer-self-service solution with all the simplicity of your cloud provider’s on-tap DBaaS offerings.

Applications can be switched between public and private clouds without the need for porting, and multi-cloud/ hybrid cloud becomes a reality. From a minikube installed on your developers’ laptop, through testing to production, there is simplicity, consistency, freedom, and control.

The Kubernetes vision of write-once and deployanywhere becomes a reality that is no longer marred by attempts to lock in, control or overcharge you for storing your data.

Frame 1

Shifting Data Storage and Resilience Left

Strategy, Organizational Efficiency, and Developer Self-Service

In many ways, DevOps has already shifted data left. As we have discussed, it is quick and easy for Kube-Native developers to self-serve a persistent data store — whether this is through DBaaS, running an off-the-shelf operator to install their favored database, or using some pretty simple lines of YAML code. However, if this results in unstable, unreliable applications, or unacceptable operational costs, it is not an effective or productive shift left and it is a poor strategy.

Out-of-control developer self-service simply rearranges the problems for the platform engineer. Instead of the burden of constantly provisioning suitable storage at the start of the development process, the platform engineer now has to stall development as the application moves to production. This creates friction with the development team and typically forces an additional development cycle to port applications onto the correct data platform/APIs. Looking at our original definition of shifting left (see below), this is diametrically opposite to that.

“The architecture provides
complete freedom to connect the
user's choice of storage into any
Kubernetes platform.”

The only real solution is to provide developers with a simple, managed self-service solution for creating databases and stateful apps, one that the platform engineer and/or the operations teams can control. Effectively, this is a database-as-a-service for the developers, just one that is managed and controlled in-house. The platform team can control performance, reliability, and costs, and traditional storage engineers can even hook in their hardware, but this is all invisible to developers. Developers simply get a click-on menu offering Persistent Volumes, popular stateful databases, and/or messaging and middleware components.

The absolute key in all of this is that using the managed in-house solution for persistent storage must be as simple and painless for the development team as using services from a cloud provider or their platform’s default networked storage —the developer should be blissfully unaware of quite how well they are doing things.

Shifting Left describes the technique of removing tasks from the (right) end of the development cycle and using tooling and automation to ingrain the necessary work as best practice from the start (left) of the development process. Larry Smith first coined the term in a 2001 article published in Dr. Dobb's Journal. His article explained a technique of shortening the software testing feedback loop to ensure that developers would receive feedback early and often.

The term is now used more broadly to describe removing any hurdles from the final move to production within the application development process. This can include testing, security, as well as storage and platform configuration. Shifting Left forms a vital part of modern DevOps efficiency and CI/CD pipelines.

Pipeline-Wide Experience — The New Land Grab

One crucial factor in delivering simplicity to the developer is consistency. Smooth CI/CD pipelines running from development, through testing to production are one of the core benefits of Kube-Native, container-based development. The build-once, run-anywhere paradigm offers organizational efficiency, faster development cycles, as well as market choice and freedom from vendor lock-in once applications are running in production.

To achieve this, Kube DevOps teams need the same consistent experience for Persistent Volumes from the time developers start playing with an idea on their laptop, right through the development pipeline, to production.

However, this important need has not escaped the notice of cloud providers and legacy storage vendors. There is a plethora of new CI/CD focused tools, APIs, and solutions becoming available from cloud/platform providers, and enterprise storage vendors. The issue is that while all of these try to offer some form of a developer-focused solution, they are grabbing storage at the development phase to lock in data from the finished application once it hits production.

In choosing a pipeline-wide solution for persistent data in Kube, companies should recognize the strategic importance and future cost implications. What seems like a small, tactical, technical choice, can end up being an organization-wide decision that restricts their ability to choose between cloud providers, limits options for private and hybrid cloud or ties future storage purchases to a single vendor.

The wise choice for delivering persistent data is undoubtedly to opt for an open, independent solution that offers both pipeline-wide simplicity and continuity while defending companies’ ongoing freedom to move applications between clouds and harness cost-effective storage from whatever source they choose.

The Ondat SaaS Platform provides the first real shift left solution for Kubernetes storage management, configuration and control. Developers get easy access to Persistent Volumes, databases and other stateful applications from a one-click self-service marketplace.

Hidden behind this simple solution, the engineering teams responsible for platform, network and storage delivery maintain complete control. Ondat provides an intuitive management platform for platform engineers and administrators to automate configuration, and connect, monitor and move the underlying storage. The architecture provides complete freedom to connect the user’s choice of storage into any Kubernetes platform: to freedom from platform and storage lock-in, the freedom to move applications at will and realise dynamic and hybrid cloud strategies.

What does freedom bring for Stateful Kubernetes Storage?

On the application side, the most obvious benefit in using an independent Kube-Native software solution for delivering persistent storage is platform independence. The ability to deliver the same persistent volumes to any Kubernetes platform allows users to easily move stateful applications between cloud providers, lets them explore private cloud, bare metal, and on-premises options, and simplifies hybrid-cloud implementations.

This choice removes platform lock-in, allows users to explore and shop around different platform options, and perhaps most importantly, gives them meaningful leverage when negotiating with new or existing service providers.

However, the benefits of freedom are equally marked on the storage side. We have already looked at the clear cost, performance, and resilience benefits of using a KubeNative storage layer when compared to hosted DBaaS services from a cloud provider. But the benefits stretch way beyond this.

The first reliable form of State appeared in Kubernetes with the general availability of statefulSets (petSets) in 2016. And now, delivering stateful applications on Kubernetes is easy at first glance — just download a Kubernetes Operator for your database of choice, for a key-value store such as Redis or streaming applications like Kafka, or just use a few lines of YAML to create persistent volumes from your chosen platform’s default block store. But as more companies look to run businesscritical systems in Kubernetes, the reality is a little more nuanced and complex.


Cloud Storage Options

In choosing a cloud storage medium to underpin stateful applications in Kubernetes, there is a necessary threeway trade-off between cost, performance and resilience. The good news is that with the right storage layer you simultaneously win on all three and the only decision you need to take is where you want to focus the bulk of the benefit.

Frame 2
The In-House Storage Option

In many organizations, delving into the world of businesscritical applications is likely to result in an encounter with operations teams, network and storage engineers, and people who talk knowledgeably about I/O performance. Environments with dedicated storage experts (and typically, six or seven-figure investments in SAN technologies) are generally not the places to suggest building important transactional systems on networked block storage or even a DBaaS offering from a cloud provider.

However, with an effective Kube-Native storage layer, the ability to use local storage volumes gives users a real choice. Companies can continue to use their in-house storage resources to feed databases and other stateful applications running in Kubernetes. This ability to re-utilize existing investments and in-house expertise can be a vital, practical solution for companies as they migrate towards greater use of Kubernetes for more critical applications.

“...the most obvious benefit in using an independent Kube-Native software solution for delivering persistent storage is platform independence.”

Ondat Example: Local Storage Cost Savings Leveraging Ondat for Resilience

A quick trip to the AWS cost calculator reveals that 100GB of the highest performing EBS storage will cost $9,663.70 per month, running with a relatively poorly performing storage medium.

Looking at local storage, by comparison, an M6gd. The large instance comes with a comparatively performant 118GB NVME SSD for $41.61 per month (with the added advantage of being a compute node).

Note: Our example looks at AWS for the obvious reason that they are the market leader, but our point is not to attack Amazon’s costs. The example compares one AWS alternative with another and a similar pricing structure is found in all cloud providers. The real revelation is that the ability to choose more suitable instances from the same provider delivers simultaneous performance gains and cost savings on an eye-watering scale.

screen 3
Resilience Focus

In the above scenario, it is still possible that a rogue platform engineer may choose to erase an entire Kubernetes cluster. Why anyone would explore such a nuclear option (without first backing up stateful data) is a valid question. Nonetheless, for this reason, many organizations prefer to maintain stateful application storage within their cloud providers’ networked storage services rather than within the Kubernetes cluster itself.

Using networked storage such as AWS EBS/EFS, Google Cloud Storage/GCS or Azure managed disks and files, comes with its costs. First and most obvious, is the financial cost of these services when compared to local storage. However, there are also significant performance bottlenecks created by networked storage. Here again, the use of an intelligent, Kube-Native storage platform layered on top of these services delivers a triple win.

The ability to share and access volumes across multiple nodes delivers more efficient utilization of both storage and compute resources, leading to significant cost savings. On performance, the ability to optimize block sizes and aggregate performance from multiple volumes overcomes one of the biggest drawbacks of networked storage (where raw performance from a single volume is typically limited to around 3000 IOPS).

Even on resilience, the right storage layer will offer extra value-add on top of the block storage itself, and (as we explored in Section 4) its impact on recovery time objective (RTO) can be the decisive factor in determining the viability of Kubernetes as a platform for business-critical applications.

Frame 1
Performance Focus

As companies start to build and migrate more businesscritical applications onto Kubernetes, storage performance becomes a much greater issue. Application performance and transaction speeds are a nonnegotiable trade-off for the owners of many businesscritical systems.

The real-world detail of application performance tuning is typically case-specific. Factors such as packet sizes, the regular or random nature of I/O, and the proportion of reads-to-writes required by an application will have a massive impact on any performance metrics. Despite this, there are some overarching truths that we can explore.

Firstly, the very idiosyncratic nature of fine-tuning storage performance for stateful applications makes the flexibility offered by a Kube-Native storage layer invaluable in itself. The ability to create bespoke architectures around the needs of a specific database, application, or workload enables engineers to deliver maximum performance at a fraction of the cost. As we have already explored, an effective storage layer gives users the flexibility to choose between local and networked storage options (see boxout), as well as offering overarching performance boosts from volume aggregation and more efficient use of any available bandwidth.

In terms of performance, the second absolute is that a Kube-Native storage layer should dramatically improve all aspects of storage performance, whatever the chosen underlying storage medium. Irrespective of any architectural choices around resilience and cost, an effective storage layer should dramatically improve I/O performance for stateful applications running on Kubernetes.

Finally, all Kube-Native storage layers are not the same. The different options available in the market today provide noticeably different levels of performance enhancement. In comparing these different solutions, users should also be wary of how performance figures are achieved. Revisiting our three-way trade-off between cost, resilience and performance, it can be easy for vendors to show impressive storage performance by turning down resilience, turning off necessary replication or using unsafe data caching methods. Well-informed, independent tests are invaluable.

Frame 2
Ondat Example: Performance Boost using Ondat with Local Storage

Revisiting our AWS cost calculator, we have 100GB of the highest performing EBS storage costing $9,663.70 per month.

Where performance is the critical factor, the differences are as marked as when the focus is on cost. At the time of writing, our research showed that using AWS i3 instances offers 8x 1900GB NVME disks which are an order of magnitude faster than the best performing EBS volume, and costs $2501.71 per month. Three instances would offer a total of 45TB of high-speed disk, 192 disks, and 1.5TB of RAM — for significantly less than the cost of 100GB of EBS storage.

Note: Our example looks at AWS for the obvious reason that they are the market leader, but our point is not to attack Amazon’s costs. The example compares one AWS alternative with another and a similar pricing structure is found in all cloud providers. The real revelation is that the ability to choose more suitable instances from the same provider delivers simultaneous performance gains and cost savings on an eye-watering scale.

screen 4

screen 5

 

The In-House Storage Option

In many organizations, delving into the world of businesscritical applications is likely to result in an encounter with operations teams, network and storage engineers, and people who talk knowledgeably about I/O performance. Environments with dedicated storage experts (and typically, six or seven-figure investments in SAN technologies) are generally not the places to suggest building important transactional systems on networked block storage or even a DBaaS offering from a cloud provider.

However, with an effective Kube-Native storage layer, the ability to use local storage volumes gives users a real choice. Companies can continue to use their in-house storage resources to feed databases and other stateful applications running in Kubernetes. This ability to re-utilize existing investments and in-house expertise can be a vital, practical solution for companies as they migrate towards greater use of Kubernetes for more critical applications.

“......with an effective Kube-Native storage layer, the ability to use local storage volumes gives users a real choice.”

Summary

Stateful applications can now be delivered and run reliably, at scale on Kubernetes. However, leaving developers to use default storage options or Kubernetes operators can result in poor, unpredictable application performance and a variety of resilience issues that fail to meet the requirements for business-critical systems.

Database-as-a-service (DBaaS) solutions solve many of these resilience issues, but they are expensive, they lock in users to the service provider, offer poor performance, and can still fail to meet RTO requirements for many applications. Using a Kubernetes-native storage layer to deliver a DBaaS-like service delivers an optimal solution — developer-self-service for persistent volumes, combined with freedom and control for how backend storage is provisioned.

Freedom to connect any storage medium to a Kubernetes cluster running on any platform allows users to migrate stateful applications and explore different architectures, and it gives them meaningful leverage when negotiating with suppliers.

The suitable Kube-Native storage layer can simultaneously improve resilience and performance whilst reducing costs. Moreover, it gives engineers the freedom to optimize combinations of these three factors and create case-specific architectures that can be baked into the container image for specific databases and applications.

The optimal shift left solution for stateful storage in Kubernetes offers DBaaS-like simplicity for developers but, going beyond this, gives engineers the freedom to optimize resilience, performance, and cost.

Frame 1

Schedule a demo to learn more

Learn how Ondat can help you scale persistent workloads on Kubernetes.