Current setup

The current Longhorn setup consists of four nodes with the following storage setup, four 8Tb HDDs, 1 Tb NVME SSD in a RAIDz array with the 1Tb SSD acting as a cache. Due to this design each node has an array with 24Tb of available storage, however after creating a zvol partitioned with ext4 the max size available is 14Tb. Furthermore Longhorn takes 20% by default as reserved space to prevent DiskPressure from causing failures to provision. After all is said and done we go from 32Tb of storage to 10.6Tb per node. This is a loss of 67% of my raw storage capacity.

TL;DR

If you see the following error repeating in a longhorn CSI plugin pod causing a CrashLoopBackOff, try disabling SELinux and restarting the pod. If the pod is able to connect to csi.sock you found your problem.

Still connecting to unix:///csi/csi.sock

The Problem

I recently have deployed Longhorn to my Kubernetes lab cluster. This has been a departure from my previous build of using Ceph, as expected there was bound to be an issue or two. Overall the deployment went well, however I have been experiencing issues with the longhorn-csi-plugin pods entering a CrashLoopBackoof state. Viewing the logs shows the following: