Migrating Longhorn from ZFS to Multi Disk Deployment
Current setup
The current Longhorn setup consists of four nodes with the following storage setup, four 8Tb HDDs, 1 Tb NVME SSD in a RAIDz array with the 1Tb SSD acting as a cache. Due to this design each node has an array with 24Tb of available storage, however after creating a zvol partitioned with ext4 the max size available is 14Tb. Furthermore Longhorn takes 20% by default as reserved space to prevent DiskPressure from causing failures to provision. After all is said and done we go from 32Tb of storage to 10.6Tb per node. This is a loss of 67% of my raw storage capacity.
As much as I love ZFS this is not worth the loss of storage. This could be remedied by reducing the Storage Minimal Available Percentage value, however this would only free up (at maximum) 4.5 Tb per node.
The OOP
- Evict 2 of 4 nodes
- Destroy zpools
- Reformat drives with ext4
- Re-add disks to Longhorn
- Label disks
- Deploy/edit existing StoragClasses
- Relabel all volumes
- Evict other half (and repeat)
Evicting nodes
To begin with I select two nodes with the least amount of data and began the eviction process. This is simple, log in to the Web UI, find the node you would like to evict, select edit, and finally select ‘Disable’ for ‘Node Scheduling’ and ‘True’ for ‘Eviction Requested’. This will begin evicting the nodes data, this may take some time depending on how much data is on the node and the speed of the other nodes.
Destroying old ZFS pools
Before destroying your ZFS pools ensure ALL replicas have been removed.
To start any zvol mounts will need to be removed from /etc/fstab
. The node status will stay disabled in longhorn post reboot, so after the zvol has been removed from fstab
simply reboot your node.
Post reboot you may now delete your zpool, in my case this is rke2 and can be done like so:
❯ sudo zpool destroy rke2
Redeploying disks
The pools of disks can now be formatted with EXT4, I have a particular way of mounting and referencing these disks and this is by no means a requirement. First I will begin reformatting all the disks with EXT4, you can simply overwrite the existing filesystems, example:
Warning: The below code snippet is dangerous, do not run without understanding what is happening
for disk in sda sdb sdc sdd nvme1n1; do echo "y" | mkfs.ext4 /dev/$disk; done
Now that these have all been formatted we can get their UUIDs
for disk in sda sdb sdc sdd nvme1n1; do blkid -s UUID -o value /dev/$disk; done
and begin editing fstab
. Out of personal preference I like to mount the disks in /mnt/ to a folder with the same UUID, below is a quick example of mounting the disks and adding them to fstab
, example:
for disk in sda sdb sdc sdd nvme1n1; do \
mkdir /mnt/$(sudo blkid -s UUID -o value /dev/$disk); \
mount /dev/$disk /mnt/$(sudo blkid -s UUID -o value /dev/$disk); \
echo "UUID=$(blkid -s UUID -o value /dev/$disk) /mnt/$(blkid -s UUID -o value /dev/$disk) ext4 defaults 1 2" >> /etc/fstab
done
Now we can return to the Longhorn UI for this node and begin adding our disks and remove the old disk for /var/lib/rancher
. When adding disks ensure you add any tags. In my case I will be adding ssd
and hdd
to my disks respectively.
After saving you should see the node update its total available storage after a few seconds.
This now needs to be done on the second node.
New Storage classes
Now it is time to create two new StorageClasses and edit the current default StorageClass
.
ConfigMap
The default StorageClass
comes as a ConfigMap
. The change is the addition of diskSelector: "default"
to the ConfigMap
.
---
# Source: longhorn/templates/storageclass.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: longhorn-storageclass
namespace: longhorn-system
labels:
app.kubernetes.io/name: longhorn
app.kubernetes.io/instance: longhorn
app.kubernetes.io/version: v1.4.1
data:
storageclass.yaml: |
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: "Delete"
volumeBindingMode: Immediate
parameters:
> diskSelector: "default"
numberOfReplicas: "2"
staleReplicaTimeout: "30"
fromBackup: ""
fsType: "ext4"
dataLocality: "disabled"
StorageClasses
I will now be adding two more StorageClasses, one for the HDDs and one for the SSDs.
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-slow
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: "Delete"
volumeBindingMode: Immediate
parameters:
diskSelector: "hdd"
numberOfReplicas: "2"
staleReplicaTimeout: "30"
fromBackup: ""
fsType: "ext4"
dataLocality: "disabled"
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: longhorn-fast
annotations:
storageclass.kubernetes.io/is-default-class: "false"
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: "Delete"
volumeBindingMode: Immediate
parameters:
diskSelector: "ssd"
numberOfReplicas: "2"
staleReplicaTimeout: "30"
fromBackup: ""
fsType: "ext4"
dataLocality: "disabled"
Edit existing volumes
There is currently no way in the LongHorn UI to adjust volume tags and I am far too lazy to edit all of them by hand.
❯ kubectl get volumes -n longhorn-system
NAME STATE ROBUSTNESS SCHEDULED SIZE NODE AGE
pvc-0078be9d-1f6b-445a-ad22-acaba5cc95c9 attached healthy 10737418240 worker-03 28d
pvc-09014301-c264-4e6c-90c9-dcd54e8d37f3 attached healthy 21474836480 worker-02 27d
pvc-093951ee-0600-4fce-8f8a-bb3e75022c8c attached healthy 53687091200 worker-02 27d
pvc-280d7773-111f-4a3b-8b0d-2b0337599575 attached healthy 53687091200 worker-01 14d
pvc-3d51540a-53a9-432f-837d-1bb790efcdc5 attached healthy 10737418240 worker-03 27d
pvc-3f1df940-6f4e-4a34-9a19-a9b1303b5fe1 attached healthy 10737418240 worker-03 28d
pvc-48f2680f-6b32-4d3a-b863-cd78e136d600 attached healthy 10737418240 worker-02 27d
pvc-511b50d9-7fc7-48e8-b614-942591506229 attached healthy 10737418240 worker-04 27d
pvc-53b3aa84-77e7-4cef-9b24-d93bafd97de8 attached healthy 8589934592 worker-01 28d
pvc-566eff31-cd13-4ed6-a80f-def3aa88f539 attached healthy 21474836480 worker-04 14d
pvc-62dae2d3-e71a-4135-9fef-62a6e8d264d6 attached healthy 10737418240 worker-02 27d
pvc-649cdec0-cc19-4c5e-87d0-a5609cfe723e attached healthy 21474836480 worker-04 28d
pvc-6f2f83e9-45ee-4abe-bb61-dbda0b1730cd attached healthy 1099511627776 worker-01 25d
pvc-71a77ff4-2882-4ae7-979a-111ed6d45ec7 detached unknown 5368709120 28d
pvc-81dc925e-f94e-4446-919e-4511026aa2f2 attached healthy 21474836480 worker-04 27d
pvc-83c51034-de8f-432c-b854-af8e1f5c882e attached healthy 53687091200 worker-02 27d
pvc-8888dc52-c01a-4df5-ba0b-7fe82c47f375 attached healthy 10737418240 worker-03 27d
pvc-969dc30e-26e5-4c1c-9d5f-3084af4d5601 attached healthy 53687091200 worker-02 17d
pvc-989a81af-1405-4cbe-b3dc-c30f21e91aa4 attached healthy 5368709120 worker-03 28d
pvc-99d8ed79-7f43-449f-9375-9ec8be03c526 attached healthy 53687091200 worker-03 4d3h
pvc-ad3c25d2-5c02-4d9f-9cbd-3ffc77a8a5ab attached healthy 1099511627776 worker-01 18d
pvc-b757ec25-5da5-418b-ae04-9263247a6f18 attached healthy 536870912000 worker-01 27d
pvc-bbbafdaa-fd06-48bf-8ad2-930e1746b30b attached healthy 1073741824 worker-02 28d
pvc-c8d97dd3-1b08-407f-99b4-5a1140c1dfba attached healthy 53687091200 worker-01 27d
pvc-cfb0a484-56f8-4006-be2c-cc478a88d588 attached healthy 21474836480 worker-03 28d
pvc-d3795c65-23cc-4b73-90ba-38d0510a4312 attached healthy 10737418240 worker-03 27d
pvc-d9754a64-427b-4a75-9083-6b0cbb6bea59 attached healthy 10737418240 worker-02 27d
pvc-db1c36e0-0ebc-468f-9e9d-aac68fe0b790 attached healthy 53687091200 worker-02 24d
pvc-e02fd7e8-be31-4e0c-8f06-d9fc0b2dd67d attached healthy 53687091200 worker-03 28d
pvc-e072d250-b09a-421e-b30b-c1044754609c attached healthy 21474836480 worker-04 27d
pvc-e2db04cf-637d-472b-ad6a-0fe4eb719e9c attached healthy 10737418240 worker-02 28d
pvc-ea85bb6f-f769-4e5e-96b0-7324463332d1 attached healthy 21474836480 worker-02 27d
pvc-f1ef7d13-ec2c-4faa-a10e-ecbe081463b8 attached healthy 10737418240 worker-03 4d3h
pvc-fcad45cd-38d5-4004-9521-c133ffe14a68 attached healthy 10737418240 worker-02 27d
pvc-fe65be2c-664c-4908-b966-382d7bcf4c68 attached healthy 10737418240 worker-01 27d
By default I want all volumes to move to HDDs as this is the bulk of available storage. First all volumes will need their diskSelector
values set to hdd
.
for vol in $(kubectl get volumes -n longhorn-system | tail -n +2 | awk '{ print $1}'); do
kubectl -n longhorn-system patch volumes.longhorn.io $vol --type=merge -p '{"spec":{"diskSelector":["hdd"]}}'
done
this will patch all available volumes, I will also now need to begin degrading the volumes by reducing their replica counts to one.
for vol in $(kubectl get volumes -n longhorn-system | tail -n +2 | awk '{ print $1}'); do
kubectl -n longhorn-system patch volumes.longhorn.io $vol --type=merge -p '{"spec":{"numberOfReplicas":1}}'
done
After updating the replica count go back to the UI, select all volumes, from the dropdown Update Replicas Auto Balance
and set it to “Best-Effort” this will force duplicate replicas to be destroyed.
Warning: At this stage, you have only 1 volume available. Your data is at risk in this stage.
Ensure your two nodes are now schedulable and from your CLI update the replica count back to 2.
for vol in $(kubectl get volumes -n longhorn-system | tail -n +2 | awk '{ print $1}'); do
kubectl -n longhorn-system patch volumes.longhorn.io $vol --type=merge -p '{"spec":{"numberOfReplicas":2}}'
done
This forces the creation of replicas on the properly tagged storage. After the data has been properly rebuilt, we should be back to having two replicas for all nodes. Now the process can simply repeat for the last two nodes.
Conclusion
The process was fairly straightforward, however there is one loose end I am unsure how to tidy up. The previous PV/C manifests still reference the default StorageClass
, this can’t be changed without creating a new PV/C and copying data from the old to the new. This can be done but would be exceptionally tedious.