I wasn’t far into my Docker journey when I ran into the docker container data persistence problem.
The corporate environment I was deploying into was typical; security was paramount, next to availability and operability. This meant building our own images and hosting on an internal image registry. As simple as it sounds, at the time there was no easy way to deploy a container-based registry that could handle server failures, without external dependencies.
The options were to:
- Use an object-store backend, which we did not yet have access to.
- Use a persistent data volume, and take care of availability ourselves.
We ended up mounting a network file system (NFS) share to each Docker server, and sharing it into the container for the data store. To avoid corruption issues a Consul lock ensured only a single registry container was accessing the data store at a time. This worked well but felt clunky, with too much custom scripting.
Today the options are largely the same. While there are more options for object-stores, but without one you’re still pretty much on your own.
Docker volume drivers offer better options
One thing has changed that would affect how I would achieve the same today: Docker volume drivers. With volume drivers (introduced in Docker 1.8), the Docker engine will manage attaching data volumes to containers. If the volumes are available on the network, they can be attached wherever the container gets scheduled. I no longer need to ensure the data volume is pre-mounted on each server and that only one instance is accessing it at a time because the scheduler or volume driver takes care of this for me.
This is a huge improvement on my first encounter with container data persistence. I can specify a data volume with my container and let the infrastructure take care of the hard bits. I end up with a simple, single container deployment that can be moved anywhere that has access to its data volume. It also has the advantage of working with any application, be it the registry, a database, or a legacy application.
Why not just use an object store backend?
Many times an object store would indeed be the best solution, especially if you already have the infrastructure in place and it’s well understood. Challenges still remain; they’ve just moved further down the stack. The object store still needs to provide a level of security and resilience, and it needs to be easy to operate. In a corporate environment, providing an object storage infrastructure is often a project in its own right, with multiple stakeholders and long lead times. This is fine for apps with object store support, but what about those without?
Alternatively, traditional block storage is very well understood in corporate environments, and while provisioning may take longer than we’d like, its use is rarely challenged. Remember, all we wanted to do is setup an internal registry.
Can block storage be made better?
With volume drivers we no longer need to use NFS and accept its short-comings, like performance and locking issues. We can provision a block device from a traditional storage array or from a cloud provider, and the volume driver will ensure it has a filesystem and mount it on the correct Docker host.
This is great, but still not ideal.
The storage provider still needs to take care of security and data resilience, and it needs to be fast and easy to provision and use. Many times, this is where traditional block storage falls short.
Ease of use
Requesting block storage in a corporate environment usually requires opening a ticket with the storage team, a bit of back and forth to justify the request and to provide the WWNs of the host’s fiber channel cards or IQNs of the iSCSI initiators, then wait.
If the provisioning gods have been kind, after a few days the storage is ready to attach. Since you waited longer than you would have liked, you probably have no intention of ever giving it back, “just in case” you ever need capacity again quickly.
Most developers would prefer the instant-gratification of talking to an API, even if it means having a quota to keep them honest.
Once a block device has been provisioned, how do I expand it? Can it be done online without logging into servers, if at all? If it can’t be done easily, by what factor should I over-provision and at what cost?
Can snapshots or clones be used for CI testing or seeding database replicas for quicker recovery? How does this integrate into my automation or orchestration tooling?
Can I use it in the cloud, or on developer laptops?
In a shared environment, how do I ensure my data remains private? I might be able to restrict what servers can mount my data, but how do I stop other containers trying to access it?
Can my data be encrypted and who holds the keys?
It needs to be fast. Database fast. Ideally it would take advantage of local disk (persistent or ephemeral) to keep data close to the application.
These are just a few of the challenges that we believe haven’t been solved yet, and why we started Ondat.
The Registry Re-visited
Going back to my internal registry, with Ondat I can now simply run a registry container with a StorageOS volume:
docker run –d –v registry-data:/var/lib/registry --volume-driver storageos registry
Ondat takes care of the hard bits like replication and data placement to keep the primary on the same host as the container accessing it, but still accessible over the network in case of host failure and container re-scheduling.
So if there’s a failure, my orchestration tools can simply bring up a new instance of the container and the existing data gets mounted into it in the background. While there will be a short loss of service while this happens, in many environments this will be acceptable, especially when compared against the complexity of an active-active service architecture.
Later, if active-active becomes important, so do features like de-duplication and fast cluster member recovery from cloned data.
State of Persistent Containers
One thing is clear to me; deploying stateful services keeps getting easier.
At Ondat we’re solving the hard storage problems. We’re bringing enterprise functionality to the container and cloud ecosystems, for integration into modern architectures.