State Persistence in Kubernetes

Volumes

Volumes are a way to store data generated by Pods in such a way that the data persists after the Pod is deleted. Let’s have a look at a config:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: v1
kind: Pod
metadata:
   name: random-number-generator
spec:
   # 1. Create a Volume.
   volumes:
   -  name: data-volume
      hostPath:
         path: /data
         type: Directory

   containers:
   -  image: alpine
      name: alpine
      command: ["/bin/sh", "-c"]
      args: ["shuf -i 0-100 -n 1 >> /opt/number.out;"]

      # 2. Mount a Volume to /opt
      volumeMounts:
      -  mountPath: /opt
         name: data-volume

We create a Volume named data-volume configured to use a directory on the host machine located at /opt.
We mount the volume as the /opt directory inside the container. When args are run, the random number will be written to, and persist on, a volume on the host machine.

Mounting a Volume on the host is not recommended for use in a multi-node cluster, because in multi-node clusters, the data located at /opt will be different depending on which node the Pod is sitting on. You can use a distributed filesystem such as GlusterFS, CephFS, Hadoop, Azure Disk, Google Persistent Disk, or AWS Elastic Block Store.

Persistent Volumes and Persistent Volume Claims

A PersistentVolume is a cluster-wide pool of storage volumes configured by an administrator to be used by users deploying applications on the cluster. Let’s look at a config for creating a PersistentVolume object:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: v1
kind: PersistentVolume
metadata:
   name: persistent-vol1
spec:
   accessModes:
   -  ReadWriteOnce
   capacity:
      storage: 1Gi
   hostPath:
      path: /tmp/data
   persistentVolumeReclaimPolicy: Delete

Not bad right? Take notice of persistentVolumeReclaimPolicy. This field tells how Kubernetes should handle the PersistentVolume once a PersistentVolumeClaim has been deleted. Your options are:

Retain: requires data to be manually deleted by the administrator, and cannot be used by any other Pods.
Recycle: performs a basic scrub (rm -rf /tmp/data/*) on the PersistentVolume, making it available for a new claim.
Delete: PersistentVolume is deleted, as well as the associated storage asset if applicable (AWS, Azure, GCP, etc…)

You can see the K8s docs to learn about how to configure the PersistentVolume, since just like Volume objects, it is not recommended to use a non-distributed file system.

A user must create a PersistentVolumeClaim to use a PersistentVolume, which Kubernetes will use to determine which PersistentVolume the Claim should be placed on. Kubernetes takes access modes, volume modes, storage class, and selectors, into account to do this. Let’s create a PersistentVolumeClaim:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
   name: my-pvc
spec:
   accessModes:
   -  ReadWriteOnce
   resources:
      requests:
         storage: 600Mi

In the example above, we are requesting 600Mi. The only other PersistentVolume we’ve defined is persistent-vol1, which has 1Gi of storage. Since this is the only Volume available, the whole PersistentVolume is claimed and therefore, can not be shared with any other Pods. This means that 400Mi of the PersistentVolume will remain unusable.

Stateful Sets

StatefulSets are similar to Deployments, since they are scalable can perform rolling updates, and rollbacks. StatefulSets allow us to deploy Pods one at a time (e.g. master, worker1, worker2, etc…), and follow an enumerated naming convention (e.g. mysql-0, mysql-1, etc…). StatefulSets maintain a “sticky” identity, meaning that the master Pod will always be at index 0, even if it died and comes back up. We may use a StatefulSet if we require a consistent naming structure, or if we care about the order the Pods come up in. Let’s look at a configuration for a StatefulSet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: apps/v1
kind: StatefulSet
metadata:
   name: mysql
   labels:
      apps: mysql
spec:
   templace:
      metadata:
         labels:
            app: mysql
         spec:
            containers:
            -  name: mysql
               image: mysql
   
   # 1. A templetized PersistentVolumeClaim
   volumeClaimTemplates:
   -  metadata:
         name: my-pvc
      spec:
         accessModes:
         -  ReadWriteOnce
         resources:
            requests:
               storage: 600Mi
               
replicas: 3
selector:
   matchLabels:
      app: mysql
serviceName: mysql-headless

# 2. Policy that determines what to do with the PersistentVolume.
podManagementPolicy: OrderedReady

Instead of creating a PersistentVolumeClaim manually, and specifying it in the StatefulSet definition file, we move the entire PersistentVolumeClaim definition into the StatefulSet yaml. If one of the Pods in the StatefulSet fails, it will spin up a new Pod, and reattach it to the same PersistentVolumeClaim (and therefore, the same PersistentVolume). Therefore, PersistentVolumeClaims executed through volumeClaimTemplates are a very stable way to provide persistent storage to Pod in a StatefulSet.
Looks a lot like a Deployment, don’t it? The only field that should look foreign is podManagementPolicy, which determines the way in which Pods are deployed. This field determines the way that Pods are deployed, either one after another, or in Parallel. To learn more about alternative podManagementPolicies, see the K8s doc.

Headless Service

Let’s say that we need a Service object that doesn’t load balance requests, but gives us a DNS entry to reach each Pod. This is where a HeadlessService comes in handy. A HeadlessService creates DNS entries for each Pod, and a subdomain. Let’s create a HeadlessService:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: Service
metadata:
   name: mysql-headless
spec:
   ports:
   -  port: 3306
   selector:
      app: mysql
   clusterIP: None

Notice that clusterIP is set to None. This is what defines it as a HeadlessService. Now, let’s incorporate this HeadlessService into a Pod spec:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
apiVersion: v1
kind: Pod
metadata:
   name: myapp-pod
   labels:
      app: mysql
spec:
   containers:
   -  name: mysql
      image: mysql

   # 1. Create a DNS record for the name of the Service.
   subdomain: mysql-headless

   # 2. Create a DNS record with the Pod Name.
   hostname: mysql-pod

You must specify a subdomain with a value the same as the name of the Service. When you do that, it creates a DNS record for the name of the Service to point to the Pod.
Specifying the hostname will create a DNS record for each individual Pod.

If we were to use a HeadlessService on a Deployment, this would result in DNS errors on all of the Pods since a Deployment creates Pods that are essentially identical. HeadlessServices are extremely useful when interacting with StatefulSets, since each StatefulSet will provide that enumerated naming convention. Let’s look at a Yaml that uses the HeadlessService described above with a StatefulSet:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: apps/v1
kind: StatefulSet
metadata:
   name: mysql-deployment
   labels:
      app: mysql
spec:
   replicas: 3
   matchLabels:
      app: mysql
   template:
      metadata:
         name: myapp-pod
      labels:
         apps: mysql
      spec:
         containers:
         -  name: mysql
            image: mysql

   # 1 Declare the name of the Service to be used.
   serviceName: mysql-headless

We need to tell the StatefulSet object to make Pods it creates utilize a service of the name mysql-headless. This allows us to access individual Pods like so:
- mysql-0.mysql-headless.default.svc.cluster.local
- mysql-1.mysql-headless.default.svc.cluster.local
- mysql-2.mysql-headless.default.svc.cluster.local

Volumes#

Persistent Volumes and Persistent Volume Claims#

Stateful Sets#

Headless Service#

Volumes

Persistent Volumes and Persistent Volume Claims

Stateful Sets

Headless Service