As a result of migrating from a single cluster with external access, to multiple internal clusters, cert-manager (on some clusters) necessarily lost access to Let’s Encrypt. This has lead to self signed certificates being used on all (intra-cluster) HTTPS endpoints internally. Realistically this is fine, however an OPNsense firewall is available and this gives the option to use its Trust function. In this example an intermediate certificate authority will be created for one of the three clusters (all three will get their own, but that does not need to be shown here).
Don’t do it, if your certificate does not have a root CA certificate attached git will not read the certificate as valid but the GitLab runner will. Your runner will authenticate but git pulls will continue to fail. If your runner is from an internal non-TLS endpoint this does not impact you.
Summary
If you have deployed GitLab via Kustomize and only have access to a self signed certificate, you will need to pass the self signed certificate into the GitLab runner allowing it to authenticate with GitLab.
If your Nextcloud instance is returning “invalid requester” after SAML has been working for some time there is a chance the certificate has expired. Many tutorials online for setting up Nextcloud with SAML+Keycloak have the user use the “Regenerate” button for creating the key/cert pair. This is perhaps more complicated and the renewal time is sub 3 months, so this process needs to be done fairly often. Below is a set of simple steps to update those certs and keys.
The current Longhorn setup consists of four nodes with the following storage setup, four 8Tb HDDs, 1 Tb NVME SSD in a RAIDz array with the 1Tb SSD acting as a cache. Due to this design each node has an array with 24Tb of available storage, however after creating a zvol partitioned with ext4 the max size available is 14Tb. Furthermore Longhorn takes 20% by default as reserved space to prevent DiskPressure from causing failures to provision. After all is said and done we go from 32Tb of storage to 10.6Tb per node. This is a loss of 67% of my raw storage capacity.
If you see the following error repeating in a longhorn CSI plugin pod causing a CrashLoopBackOff, try disabling SELinux and restarting the pod. If the pod is able to connect to csi.sock you found your problem.
Still connecting to unix:///csi/csi.sock
The Problem
I recently have deployed Longhorn to my Kubernetes lab cluster. This has been a departure from my previous build of using Ceph, as expected there was bound to be an issue or two. Overall the deployment went well, however I have been experiencing issues with the longhorn-csi-plugin pods entering a CrashLoopBackoof state. Viewing the logs shows the following: