To answer why orchestrated storage for containers effectively, we first have to answer two predecessor questions, why containers, and why container orchestration?
Containers offer a number of advantages over traditional methods of software deployment.
- Containers are immutable. When we deploy a container, the code contained within cannot change. If code changes are required, we build a new container. This brings an end to anti-patterns such as modifying code in place on production hosts.
- Containers don’t just contain code we deploy, but a full userland. This enables us to state with confidence that when we build a container, it should run on any machine with a compatible kernel. It vastly reduces deployment risk due to incompatible libraries between distributions. Anyone who has suffered a production outage as a result of applying security patches to hosts will sympathize with this.
- Containers are ephemeral. Containers comprise a layered image which is immutable and a writable ‘Container Layer’. New, or modified data is stored in the writable container layer. When a container is deleted its writable layer is removed leaving just the underlying image layers behind. Startup costs of containers are minimal when compared with physical/virtual machines, allowing us to leverage this third advantage – ephemeral containers with short life-cycles.
Containers provide us these advantages and more, but now we need to run our containers. Interfaces such as Docker and systemd-nspawn provide simple command line driven interfaces to start individual containers.
Why container orchestration?
When we want to orchestrate multiple containers across multiple machines, we need something a little more robust. Orchestrators such as Kubernetes, OpenShift, Rancher and Docker Swarm exist to solve this problem. They allow us to define which containers should run, and handle the drudge work of scheduling containers and keeping copies running in the event of failure. Using a simple declarative syntax, we can:
- Ask for n copies of our application to run
- Painlessly upgrade
- Implement phased upgrades that preserve uptime while running containers are swapped out a proportion at a time.
Of course, from a cost perspective, container orchestrators allow greater leverage of compute resources, thus compounding increased developer efficiency with a more tangible hardware cost reduction.
Why orchestrated storage for containers?
Many organizations are somewhere along this journey, seduced by the ease of deployment and orchestration that systems such as Kubernetes provide. They migrate stateless workloads such as microservices, transformation pipelines, etc. into a cluster. At first, the question of persistent storage does not arise – “we have many workloads to move, we’ll move the easy ones, or the ones that most benefit from parallelization and cluster semantics.”
At some point though, we are left with a cluster running the majority of our workloads, while those services holding our actual data (arguably the most important in the enterprise), such as databases and message buses, languish outside the cluster, still managed by traditional methods. If we like declarative syntax, efficient hardware usage and immutable infrastructure, surely we want to move all our workloads into our cluster, to be deployed, monitored and managed using the same tooling?
Here we hit a roadblock. Persistent storage for containers and the fact that the writable layer is ephemeral do not at first glance mix well.
- Changes saved to a container disappear as soon as the container stops.
- Cluster orchestrators often allow us to present host based persistent storage to a container, injected at a mount point, but this approach suffers from disadvantages – it locks the data to a node which means an orchestrator cannot move your application to any other node in the event of a failure.
- Some cluster orchestrators allow us to directly attach cloud storage into containers. However attach/detach times can be slow, and introduce an element of risk at container start time.
There is a clear need for an orchestrated storage layer for containers that is software based, with all the flexibility that implies. Preferably, that storage layer should deploy as a container so that we can take advantage of the orchestrator for easy deployment alongside the applications that require storage.
Ondat was designed with these ideas in mind. When deployed, our software:
- Aggregates all of the storage on the nodes in a cluster into a pool.
- Allows us to present volumes to our application using the same declarative syntax we use to define the applications themselves.
- Replicates volumes to insulate us from node failure.
Containers and orchestrators are powerful concepts, and with Ondat we can finally move all of our workloads into our clusters and reap the benefits.