How To Use Skaffold With Docker and Kustomize To Build a Pipeline for Stateful Applications Running in Kubernetes

How To Use Skaffold With Docker and Kustomize To Build a Pipeline for Stateful Applications Running in Kubernetes
Part 3: Let’s Add Data!

This is the last part of the series “How to build a CI/CD pipeline for Kubernetes stateful applications”. So far, we deployed and configured:

  • Kustomize
  • MongoDB Community Operator
  • Ondat

In this article, we’re going to deploy the application along with the MongoDB database as a Kubernetes StatefulSet using the Operator and Skaffold to create a continuous development pipeline for our local k3s cluster. Finally, we’ll test the pipeline and update our awesome Marvel app! Let’s get started!

Note: if you want to follow along, you’ll find the resources we’ve used on Github:

Create MongoDB Custom Resource and Define Data Services Requirements

As usual, the Custom Resource is passed to Kubernetes as a YAML file. The MongoDB Community Operator repo gives several examples here. You’ll need to tweak specific parameters according to your use case.

At the root of theapplication manifests repo you have previously cloned, you’ll find the MongoDB custom resource we used. It is the result computed by Kustomize, after running kustomize build overlay/dev. The file name is mongodb-config.yaml, which is detailed below:

kind: MongoDBCommunity
name: mongodb
members: 3
version: 5.0.5
selector: {}
serviceName: mongodb
- metadata:
name: data-volume
- ReadWriteOnce
storage: 1Gi
storageClassName: ondat-replicated
- metadata:
name: logs-volume
- ReadWriteOnce
storage: 1Gi
storageClassName: ondat-replicated
type: ReplicaSet
- db: admin
name: admin
name: admin-password-df8t2cdf9f

- db: admin
name: clusterAdmin
- db: admin
name: userAdminAnyDatabase
- db: admin
name: dbAdminAnyDatabase
- db: admin
name: readWriteAnyDatabase

scramCredentialsSecretName: admin

We’ve highlighted essential features in bold. The custom resource defines the number of initial MongoDB nodes in the cluster, the version of MongoDB, configures the MongoDB ReplicaSet, the Kubernetes StorageClass, and specifies the database admin username/password and roles. It encapsulates the information required to create the database and logically represents an abstraction of the desired database configuration. These parameters are computed dynamically by Kustomize according to the destination environment described in the overlay.

The StorageClass determines which CSI the MongoDB StatefulSet uses to manage its data. It is paramount to choose the appropriate provider at every step in the application lifecycle, from development to production. The “Shift Left” premise has already been discussed in a previous blog, so I won’t delve too deep. The idea is that developers need consistent tools from dev to prod, from their laptop to a 100-node production cluster. They should be able to test their code at every stage of the development with the same capabilities. Ondat provides this consistency for Kubernetes persistent volumes and additional enterprise features that are critical when running stateful workloads in production at scale. It includes synchronous replication, performance optimizations, strong encryption in transit and at rest, and a Kube-Native approach to managing these functions.

The ondat-replicated StorageClass is defined in manifests generated by Kustomize. The StorageClass configuration, as ingested by the Kubernetes API after Kustomize computes it, is the following:

kind: StorageClass
name: ondat-replicated
parameters: xfs storageos-api storageos "true" "1"

allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: Immediate

Again, we’ve highlighted essential features in bold. We’ve changed the default filesystem used for the persistent volumes to XFS, as recommended by MongoDB, for better scale and performance. We’ve also enabled encryption and set the number of volume replicas to 1. It means that every persistent volume provisioned by the Ondat CSI and configured by this StorageClass will be formatted with XFS, encrypted at rest with AES-256 cipher (in-transit encryption is enabled by default and cannot be changed), and have 1 replica available within the Ondat data-mesh.

Deploy the Database

The database itself doesn’t need to be deployed manually. Skaffold will handle the application deployment workflow, which includes the MongoDB cluster. The only prerequisite is to have the MongoDB Operator running and the Custom Resource type available in Kubernetes. Check that the operator is running using the following command:

$ kubectl get pods -n mongo-operator
mongodb-kubernetes-operator-6d46dd4b74-ldfcc 1/1 Running ...

Also, check that the Custom Resource type has been defined:

$ kubectl  api-resources | grep mongo
mongodbcommunity mdbc ...

Working With the Database Using Pymongo

Before diving into the pipeline configuration with Skaffold, let’s take a look at how to code our Kubernetes Job and our FE to access the MongoDB database.

Pymongo provides Python bindings to interact with MongoDB. At the time of writing, the latest version is 4.0.1, and the documentation is available at It is pretty straightforward to use, and here is an example of code showing how the Kubernetes Job connects to the database, creates a MongoDB collection, and adds a JSON document to it.

Since we’re using a Mongo Replica Set, the driver expects a seed list. It will attempt to find all members of the set within that seed list, but the operation is not blocking. It silently returns. So if you want to catch any connection error, you should add the following to your code (from the documentation):

In the add_mongo_document function code, we specify a seed list with three members, which is passed to the application through environment variables. These variables are included in a Kubernetes ConfigMap that is autogenerated by the Kustomize configMapGenerator we previously mentioned. Both the Kubernetes Job and the FE Deployment manifests contain references to that ConfigMap. Again, you can check it by running kustomize build overlay/dev. The interesting part is the following:

- envFrom:
- configMapRef:
name: mongo-config-b29f887ch6

You can notice that Kustomize has effectively generated a unique name with a random string suffix, that is referenced by both the Job and Deployment manifests. The variables defined in these manifests are specified as literals in the kustomization.yaml file under the configMagGenerator section:

- name: mongo-config
- MONGO_SEED0=mongodb-0.mongodb.default.svc.cluster.local
- MONGO_SEED1=mongodb-1.mongodb.default.svc.cluster.local
- MONGO_SEED2=mongodb-2.mongodb.default.svc.cluster.local
- OFFSET=600

Similarly, the password required by the function is passed through a Secret where the password value is associated with a key named password.

- name: admin-password
- password=mongo

Next in line 10, db.marvel implicitly defines a new MongoDB collection (the NoSQL equivalent to a relational database table).

In line 11, db.characters.insert_one(document)creates a new document in the collection. Every JSON response from the Marvel API endpoint is parsed to fit the following structure:

"id": 10093467,
"name": "Iron Fist (Danny Rand)",
"thumbnail": "",
"extension": "jpg",
"available": "98",
"collectionURI": "",
"resourceURI": "",
"name": "A+X (2012) #5"
{ "resourceURI": "",
"name": "Absolute Carnage: Lethal Protectors (2019) #2"
"returned": 20

Install and Configure Skaffold

Installing Skaffold is pretty straightforward, just follow the steps indicated here. The next step is to configure Skaffold by running skaffold init. It generates a file called skaffold.yaml that provides a standard template to work with. The command is asking a few questions, but you can choose any of the answers, the goal is just to start with a YAML file that is not empty…I mean who enjoys starting with an empty YAML configuration :-)?

This is where we specify our build and deploy options, and the final configuration looks like the following (the skaffold.yaml configuration file is located at the root of the application repository):

apiVersion: skaffold/v2beta26
kind: Config
name: demo-marvel-app
- image: vfiftyfive/flask_marvel
buildCommand: sh
push: true
- <path_to_dev_overlay>

The custom build command is thebuild.shfile I previously mentioned, and the Kustomize path is the path to the local dev overlay directory. We’re also publishing the image on Docker Hub, which requires Skaffold to know about your Docker Hub credentials. The easiest way to integrate with Docker Hub is to login to Docker with docker login and then configure Skaffold to use your registry by running skaffold dev --default-repo=<your_registry>, where<your_registry>Docker Hub is your user name.

Once you have set up your KUBECONFIG environment variable to connect to your dev Kubernetes cluster, the only thing left is to run skaffold dev:

$ export KUBECONFIG=<path_to_kubeconfig_file>
$ skaffold dev
Listing files to watch...
- vfiftyfive/flask_marvel
Generating tags...
- vfiftyfive/flask_marvel -> vfiftyfive/flask_marvel:6f061f0
Checking cache...
- vfiftyfive/flask_marvel: Not found. Building
Starting build...
Building [vfiftyfive/flask_marvel]...
+ docker buildx build --builder skaffold-builder --tag vfiftyfive/flask_marvel:6f061f0 --platform linux/amd64,linux/arm64 --push /Users/nvermande/Documents/Dev/Ondat/FlaskMarvelApp
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 299B 0.0s done
#1 DONE 0.1s
#2 [internal] load .dockerignore
#2 transferring context: 2B 0.0s done
#2 DONE 0.0s
#3 [linux/arm64 internal] load metadata for
#3 ...
...Tags used in deployment:
- vfiftyfive/flask_marvel -> vfiftyfive/flask_marvel:6f061f0@sha256:629734c5e62206752f051e9f47fdc3bc6d1f61e399b9a89920c8d7d9f87ee0f8
Starting deploy...
- created
- service/marvel-frontend created
- service/mongodb created
- deployment.apps/marvel-frontend created
- statefulset.apps/mongodb created
- job.batch/add-data-to-mongodb created
Waiting for deployments to stabilize...
- deployment/marvel-frontend: creating container flask-marvel
- pod/marvel-frontend-5bdd684d78-zpbvm: creating container flask-marvel
- pod/marvel-frontend-5bdd684d78-2mf4m: creating container flask-marvel
- statefulset/mongodb: creating container mongodb
- pod/mongodb-0: creating container mongodb
- deployment/marvel-frontend is ready. [1/2 deployment(s) still pending]
...Generating tags...
- vfiftyfive/flask_marvel -> vfiftyfive/flask_marvel:6f061f0
Checking cache...
- vfiftyfive/flask_marvel: Found Remotely
Tags used in deployment:
- vfiftyfive/flask_marvel -> vfiftyfive/flask_marvel:6f061f0@sha256:629734c5e62206752f051e9f47fdc3bc6d1f61e399b9a89920c8d7d9f87ee0f8
Starting deploy...
Waiting for deployments to stabilize...
- statefulset/mongodb is ready. [1/2 deployment(s) still pending]
- deployment/marvel-frontend is ready.
Deployments stabilized in 1.49 second
Watching for changes...

As you can see from the output above, Skaffold builds the Docker image and deploy the Kubernetes manifests using Kustomize, configuring the new FE image with the specific image tag and digest as a result of the image build.

Skaffold has another interesting feature which is the ability to output the logs of the Pods it monitors in real-time. So for example, if you build a foo and a bar artifacts, Skaffold will display the output of the foo and bar Pods when they are updated. This is available when Skaffold is running in daemon mode with the dev option above. This means that while you’re developing your application, you don’t have to use kubectl to gather containers’ logs from multiple places. Skaffold centralizes them and then sends them to the daemon standard output! Quite handy I have to admit.

If we look at what is now running in the Kubernetes cluster, we can see the following (the age field has been truncated):

$ kubectl get pods
add-data-to-mongodb-9brkw 0/1 Completed 0
marvel-frontend-5bdd684d78-2mf4m 1/1 Running 0
marvel-frontend-5bdd684d78-zpbvm 1/1 Running 0
mongodb-0 1/1 Running 0
mongodb-1 1/1 Running 0
mongodb-2 1/1 Running 0

Skaffold has deployed all the components required for our application to work. Before trying to modify the source code, let’s check the application is working properly. For this, we just use kubectl port-forward and check the result locally with our preferred browser:

$ kubectl port-forward svc/marvel-frontend 8080
Forwarding from -> 80
Forwarding from [::1]:8080 -> 80

If we browse tohttp://localhost:8080, we can see the app working:


Let’s say we now want to modify some text in this application. We want to replace “Comics” by “Comic(s)”. For this, just edit the HTML code in the application repository under app > templates > pages.html

Replace all occurrences of “Comics” by “Comic(s)”. Here is an example of an occurrence:

Then save the file, and you should see the following output from Skaffold:

Generating tags...
- vfiftyfive/flask_marvel -> vfiftyfive/flask_marvel:186a97d
Checking cache...
- vfiftyfive/flask_marvel: Found. Tagging
Tags used in deployment:
- vfiftyfive/flask_marvel -> vfiftyfive/flask_marvel:186a97d@sha256:33034d0241d6fbd586f550766ae22ed8633f099b53cca9a4544510c856f77811
- vfiftyfive/marvel_init_db -> vfiftyfive/marvel_init_db:186a97d@sha256:ca57d37157384fb83616708b69ee12e60b8023fa05cef2325b9537b13bd934ce
Starting deploy...
- deployment.apps/marvel-frontend configured
Waiting for deployments to stabilize...
- dev:deployment/marvel-frontend is ready.
Deployments stabilized in 4.263 seconds
Watching for changes...

In Kubernetes, you should see new frontend containers being created and the old ones destroyed:

$ kubectl get pods -w
marvel-frontend-7d876b7bff-57vxb 1/1 Running 0 10s
marvel-frontend-7d876b7bff-wr9s9 1/1 Running 0 9s
marvel-frontend-7d876b7bff-6hnx5 1/1 Running 0 7s
marvel-frontend-65d655d644-fksdl 0/1 Terminating 0 7m23s
marvel-frontend-65d655d644-fksdl 0/1 Terminating 0 7m26s
marvel-frontend-65d655d644-fksdl 0/1 Terminating 0 7m26s

You can browse to the same URL and see the updated application:


We hope this deep dive into the development lifecycle of stateful applications in Kubernetes has highlighted their challenges.

First, they require an abstraction level to manage their configuration and deployment, which can be achieved by leveraging specific Operators. Still, there’s a lack of standard in their approach for databases, which may lead to confusion and technical challenges when fine-tuning is required.

Second, Kubernetes provides the ability to encapsulate infrastructure requirements as YAML. They can therefore easily be injected into CI/CD pipelines, but a lot of glue is needed to map out the different components. Skaffold is a valuable tool that integrates Docker and custom scripts for the build phase and Kustomize for the deployment phase. While in dev mode, Skaffold will update relevant components as soon as you save changes locally in your development environment.

Finally, Kubernetes doesn’t provide premium storage features by default. But replication, encryption, thin provisioning and optimized performances for persistent volumes are key requirements when running stateful applications on the defacto cloud OS. Ondat provides the data plane to enable these capabilities while fully integrating with the Kubernetes control plane. Regardless of Kubernetes cluster location (such as in your local data center, in the public cloud, or on your laptop), Ondat provides a distributed software-defined storage solution. You can then take advantage of critical data services to ensure stateful applications scale, resiliency, and performance consistency.

Things I’ve learned along the way

  • If you have Kubernetesjobsin Skaffold, you need to pass--forcewhen using skaffold dev. This is because once a Job exists in Kubernetes, its template section, which contains the image property, is immutable. The force option deletes it instead of updating it, in the same way you would do with kubectl replace --force -f my-job.yaml
  • The MongoDB Community Kubernetes Operator enables authentication by default and creates a user. The user name is configurable, but remember that it is managed by the Operator. Any manual modification to this user through mongo imperative commands will be reverted. This is because the Operator relies on a reconciling loop that ensures the Custom Resource configuration prevails. So, for example, if you need to add roles to this user, do it from the Custom Resource section responsible for user configuration:
- name: admin
db: admin
name: admin-password
- name: clusterAdmin
db: admin
- name: userAdminAnyDatabase
db: admin
- name: dbAdminAnyDatabase
db: admin
- name: readWriteAnyDatabase
db: admin

scramCredentialsSecretName: my-scram

We’ve added a couple of new roles in bold, so our admin user has suitable privileges for all databases.

  • The Operator also expects a key named “password” within the user Secret. The password value represents the user password.
  • In Kustomize, every YAML node needs to have a name field in the manifests located in the base directory. If you omit the name in the base folder and add a patch in the Kustomization file, the patch won’t be applied.
  • When using Kustomize to replace specific fields in Custom Resources with an output value from a Kustomize generator, a configuration section is required in the Kustomization file to define the additional mappings. For example, if you need to specify the name of aSecretgenerated by the Kustomize secretgenerator as a value for the MongoDBCommunity Custom Resource passwordSecretRef field, you need to tell Kustomize that every time it’s running into a passwordSecretRef, it has to replace its value with that of the generated Kubernetes Secret name.


written by:
Nic Vermandé
Nic is an experienced hands-on technologist, evangelist and product owner who has been working in the fields of Cloud-Native technologies, Open Source Software, Virtualization and Datacenter networking for the past 17 years. Passionate about enabling users and building cool tech solving real-life problems, you’ll often see him speaking at global tech conferences and online events, spreading the word and walking the walk with customers.

Register for our SaaS Platform

Learn how Ondat can help you scale persistent workloads on Kubernetes.