Category Archives: Notes to self

Automatically expand statically provisioned disks for StatefulSet in AKS

When working with StatefulSets in Kubernetes, you can use volumeClaimTemplates to have K8s dynamically provision Persistent Volumes for you. In the case of Azure Kubernetes Service, these end up in the MC_resouce-group-name_aks-name_region resource group together with the other automatically provisioned resources like VMs and Load Balancers.

Statically provisioned disks

For persistent data (in its broader meaning, not K8s terms), you may not want to tie your data storage lifecycle to your Kubernetes cluster. Instead you may wish to create you storage – such as Azure Managed disks – outside of K8s (say, Terraform) and then map them to your pods.

The Azure Kubernetes Service documentation contains a simple example of how this can be achieved for a single pod. Andy Zhang, working with Kubernetes at Microsoft, has a GitHub repository with more detailed examples of static provisioning.

In short, you create your Persistent Volume

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-statically-provisioned-disk
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  azureDisk:
    kind: Managed
    diskName: {diskName}
    diskURI: /subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.Compute/disks/{diskName}
    fsType: ext4
    readOnly: false
    cachingMode: ReadOnly

and then you can reference it either directly from your pod, or via an intermediate PersistentVolumeClaim.

StatfulSet

When you want to use statically provisioned disks as the persistent volumes of a StatefulSet, you could create PersistentVolumeClaims with the same names as those that the StatefulSet would have created for dynamic provisioning (claimname-podname-N), before you create the StatefulSet.

A more elegant solution however, is to use match labels to identify the PVs to use for your StatefulSet pods, as per this Stackoverflow anser.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-statically-provisioned-disk
  labels:
    app: influxdb # Used by volumeClaimTemplates
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  azureDisk:
    kind: Managed
    diskName: {diskName}
    diskURI: /subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.Compute/disks/{diskName}
    fsType: ext4
    readOnly: false
    cachingMode: ReadOnly
---
apiVersion: apps/v1
kind: StatefulSet
...
spec:
  ...
  template:
    ...
    spec:
      ...      
  volumeClaimTemplates:
  - metadata:
      name: influxdata
    spec:
      selector:
        matchLabels:
          app: influxdb # The labels that the PersistentVolumes must have to be used to fulfil this claim
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi

Automatic volume expansion

Kubernetes supports automatic expansion of dynamically provisioned disks, and we can actually make this work for statically provisioned disks as well.

First, regardless of dynamic vs static provisioning you must have a StorageClass with allowVolumeExpansion: true. As per the Azure documentation, this is not the case for the built in storage classes. Instead you will need to create your own StorageClass, for example

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: expandable-managed-premium
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Delete
allowVolumeExpansion: true # Change compared to built in managed-premium
parameters:
  storageaccounttype: Premium_LRS
  kind: Managed

Now, make sure to reference this storage class from your volumeClaimTemplates and, in the case of static provisioning, your PersistentVolume.

In the case of statically provisioned disks, I also suggest you add the pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk annotation. (I have yet to verify whether this is required or not.)

So we end up with something like this

apiVersion: v1
kind: PersistentVolume
metadata:
  name: my-statically-provisioned-disk
  labels:
    app: influxdb # Used by volumeClaimTemplates
  annotations:
    pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk
  finalizers:
  - kubernetes.io/pv-protection
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  azureDisk:
    kind: Managed
    diskName: {diskName}
    diskURI: /subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.Compute/disks/{diskName}
    fsType: ext4
    readOnly: false
    cachingMode: ReadOnly
  storageClassName: expandable-managed-premium # Support volume expansion
  volumeMode: Filesystem
---
apiVersion: apps/v1
kind: StatefulSet
...
spec:
  ...
  template:
    ...
    spec:
      ...      
  volumeClaimTemplates:
  - metadata:
      name: influxdata
    spec:
      selector:
        matchLabels:
          app: influxdb # The labels that the PersistentVolumes must have to be used to fulfil this claim
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      storageClassName: expandable-managed-premium # Support volume expansion

Automatic volume expansion for StatefulSet

At the time of this writing, changing the resources.requests.storage of a StatefulSet volumeClaimTemplates is not supported by Kubernetes. There are issues on GitHub both for the main project and the Enhancement Tracking and Backlog project. (The latter also has a PR.)

Here is my recommended approach to work around this until properly supported, which works for dynamically as well as statically provisioned disks:

  1. Prepare your Kubernetes (or Helm) file with the new storage size.
  2. For each pod (P) in your StatefulSet, repeat
    1. Delete the StatefulSet without cascading
      kubectl delete sts --cascade=false your-stateful-set
      
    2. Delete the pod
      kubectl delete pod/my-pod-P
      
    3. Edit the PersistentVolumeClaim of the pod
      kubectl edit pvc my-volume-my-pod-P
      

      Set the new storage size under resources.requests.storage.

    4. Re-create the StatefulSet using kubectl apply or Helm as applicable
    5. Wait for the volume to expand. Use kubectl describe pvc to see the progress. Sometimes another pod restart is required for changes to be applied.

References

  • This GitHub issue contains some tips for migrating from dynamically to statically provisioned disks. You can also use disk snapshots.
  • Andy Zhang also has some tips on growing disks on GitHub.

Non-backwards compatible SQL database migration

Sometimes you need/want to make more radical changes to your SQL database schema, such as renaming a column or moving data from one table to another.

Tools like Flyway and Liquibase has simplified making backwards compatible database migrations, such as adding columns/tables. However in order to make non-backwards compatible changes “online” (i.e. while the application is up and running) for a clustered environment (multiple application instances accessing the same database) requires a little more thought.

Basically you need to make this change in 5 steps, spread out across (at least) 4 releases (assuming a release means updating the database, either separately or during application boot using a migration tool, and updating application – in that order). I’ll use renaming a database column as an example.

  1. Database: Add the new column (i.e. do not rename or drop the previous one) to the database. You may copy existing data to new column also.
    Application: Start writing the data to both the old and the new column.
  2. Database: Copy existing data to new column also. Even if you did that in the previous step, you need to do it again, since the application may have written to the old column only between the time when the database migration was executed, and the time the application was updated. You could opt to only copy the data that was written during that time (i.e. where the old column is non-null but the new column is null.) This does not have to be a separate release, but could be the migration made as part of the next application release.
  3. Application: Start reading data from the new column instead of the old column. Note that you must not stop writing data to the old column yet, since as you update one application instance (i.e. one cluster node) at a time, there can be non-updated nodes reading the old column for data written by updated nodes.
  4. Application: Stop writing to the old column.
    Database: Note that we cannot drop the old column yet, since the non-updated nodes will still be writing to it.
  5. Database: Drop the old column.