Volumes
Volumes are a way to store data generated by Pods in such a way that the data persists after the Pod is deleted. Let’s have a look at a config:
|
|
We create a Volume named
data-volume
configured to use a directory on the host machine located at/opt
.We mount the volume as the
/opt
directory inside the container. Whenargs
are run, the random number will be written to, and persist on, a volume on the host machine.
Mounting a Volume on the host is not recommended for use in a multi-node
cluster, because in multi-node clusters, the data located at /opt
will be
different depending on which node the Pod is sitting on. You can use a
distributed filesystem such as GlusterFS, CephFS, Hadoop, Azure Disk, Google
Persistent Disk, or AWS Elastic Block Store.
Persistent Volumes and Persistent Volume Claims
A PersistentVolume is a cluster-wide pool of storage volumes configured by an administrator to be used by users deploying applications on the cluster. Let’s look at a config for creating a PersistentVolume object:
Not bad right? Take notice of persistentVolumeReclaimPolicy
. This field tells
how Kubernetes should handle the PersistentVolume once a PersistentVolumeClaim
has been deleted. Your options are:
- Retain: requires data to be manually deleted by the administrator, and cannot be used by any other Pods.
- Recycle: performs a basic scrub (
rm -rf /tmp/data/*
) on the PersistentVolume, making it available for a new claim. - Delete: PersistentVolume is deleted, as well as the associated storage asset if applicable (AWS, Azure, GCP, etc…)
You can see the K8s docs to learn about how to configure the PersistentVolume, since just like Volume objects, it is not recommended to use a non-distributed file system.
A user must create a PersistentVolumeClaim to use a PersistentVolume, which Kubernetes will use to determine which PersistentVolume the Claim should be placed on. Kubernetes takes access modes, volume modes, storage class, and selectors, into account to do this. Let’s create a PersistentVolumeClaim:
In the example above, we are requesting 600Mi. The only other PersistentVolume
we’ve defined is persistent-vol1
, which has 1Gi of storage. Since this is the
only Volume available, the whole PersistentVolume is claimed and therefore, can
not be shared with any other Pods. This means that 400Mi of the PersistentVolume
will remain unusable.
Stateful Sets
StatefulSets are similar to Deployments, since they are scalable can perform rolling updates, and rollbacks. StatefulSets allow us to deploy Pods one at a time (e.g. master, worker1, worker2, etc…), and follow an enumerated naming convention (e.g. mysql-0, mysql-1, etc…). StatefulSets maintain a “sticky” identity, meaning that the master Pod will always be at index 0, even if it died and comes back up. We may use a StatefulSet if we require a consistent naming structure, or if we care about the order the Pods come up in. Let’s look at a configuration for a StatefulSet:
|
|
Instead of creating a PersistentVolumeClaim manually, and specifying it in the StatefulSet definition file, we move the entire PersistentVolumeClaim definition into the StatefulSet yaml. If one of the Pods in the StatefulSet fails, it will spin up a new Pod, and reattach it to the same PersistentVolumeClaim (and therefore, the same PersistentVolume). Therefore, PersistentVolumeClaims executed through volumeClaimTemplates are a very stable way to provide persistent storage to Pod in a StatefulSet.
Looks a lot like a Deployment, don’t it? The only field that should look foreign is
podManagementPolicy
, which determines the way in which Pods are deployed. This field determines the way that Pods are deployed, either one after another, or in Parallel. To learn more about alternative podManagementPolicies, see the K8s doc.
Headless Service
Let’s say that we need a Service object that doesn’t load balance requests, but gives us a DNS entry to reach each Pod. This is where a HeadlessService comes in handy. A HeadlessService creates DNS entries for each Pod, and a subdomain. Let’s create a HeadlessService:
Notice that clusterIP
is set to None
. This is what defines it as a
HeadlessService. Now, let’s incorporate this HeadlessService into a Pod spec:
- You must specify a
subdomain
with a value the same as the name of the Service. When you do that, it creates a DNS record for the name of the Service to point to the Pod. - Specifying the
hostname
will create a DNS record for each individual Pod.
If we were to use a HeadlessService on a Deployment, this would result in DNS errors on all of the Pods since a Deployment creates Pods that are essentially identical. HeadlessServices are extremely useful when interacting with StatefulSets, since each StatefulSet will provide that enumerated naming convention. Let’s look at a Yaml that uses the HeadlessService described above with a StatefulSet:
|
|
- We need to tell the StatefulSet object to make Pods it creates utilize a
service of the name
mysql-headless
. This allows us to access individual Pods like so:mysql-0.mysql-headless.default.svc.cluster.local
mysql-1.mysql-headless.default.svc.cluster.local
mysql-2.mysql-headless.default.svc.cluster.local