I have come to the realization that online documentation around configuring Nextcloud to use SAML is lacking. I am not an expert by ANY means but I know enough to get things working with some trial and error. The following post is more or less a TL;DR of what to set to enable SAML auth in Nextcloud via Keycloak.
Deployment
Versions
Nextcloud: 14.5.0
SSO & SAML authentication: 6.0.1
Keycloak: 21.1.2
Setup Nextcloud SAML
Below are the settings needed for each section of the SAML settings page, do note most settings are hidden so you will need to expand them.
Add externalTrafficPolicy: Local to a Kubernetes service to capture source IPs.
The Issue
Kubernetes by default will not pass the source IP to a LoadBalancer service. Usually this is not particularly an issue most days, however this has become an issue as I re-combine all my rke2 clusters. It was simple before to place internal only applications on one of the two internal only clusters, as I move back to a single cluster Traefik needs to be made aware of source IPs so white listing can be used.
Currently there appears to be a lack of options for home cameras that meet my current needs/expectations. As a result I set out to build my own cameras to fill this need. The intent is to build basic home security cameras that meet the following needs:
Ethernet
POE/POE+
A “moderate” or better camera quality (think 1080 or higher resolution)
No microphone
No PTZ functions
No “cloud” functions
Frigate compatible
Justifications
The majority of these needs are driven by a need for privacy/security.
As a result of migrating from a single cluster with external access, to multiple internal clusters, cert-manager (on some clusters) necessarily lost access to Let’s Encrypt. This has lead to self signed certificates being used on all (intra-cluster) HTTPS endpoints internally. Realistically this is fine, however an OPNsense firewall is available and this gives the option to use its Trust function. In this example an intermediate certificate authority will be created for one of the three clusters (all three will get their own, but that does not need to be shown here).
Don’t do it, if your certificate does not have a root CA certificate attached git will not read the certificate as valid but the GitLab runner will. Your runner will authenticate but git pulls will continue to fail. If your runner is from an internal non-TLS endpoint this does not impact you.
Summary
If you have deployed GitLab via Kustomize and only have access to a self signed certificate, you will need to pass the self signed certificate into the GitLab runner allowing it to authenticate with GitLab.
If your Nextcloud instance is returning “invalid requester” after SAML has been working for some time there is a chance the certificate has expired. Many tutorials online for setting up Nextcloud with SAML+Keycloak have the user use the “Regenerate” button for creating the key/cert pair. This is perhaps more complicated and the renewal time is sub 3 months, so this process needs to be done fairly often. Below is a set of simple steps to update those certs and keys.
The current Longhorn setup consists of four nodes with the following storage setup, four 8Tb HDDs, 1 Tb NVME SSD in a RAIDZ array with the 1Tb SSD acting as a cache. Due to this design each node has an array with 24Tb of available storage, however after creating a zvol partitioned with ext4 the max size available is 14Tb. Furthermore Longhorn takes 20% by default as reserved space to prevent DiskPressure from causing failures to provision. After all is said and done we go from 32Tb of storage to 10.6Tb per node. This is a loss of 67% of my raw storage capacity.
If you see the following error repeating in a longhorn CSI plugin pod causing a CrashLoopBackOff, try disabling SELinux and restarting the pod. If the pod is able to connect to csi.sock you found your problem.
Still connecting to unix:///csi/csi.sock
The Problem
I recently have deployed Longhorn to my Kubernetes lab cluster. This has been a departure from my previous build of using Ceph, as expected there was bound to be an issue or two. Overall the deployment went well, however I have been experiencing issues with the longhorn-csi-plugin pods entering a CrashLoopBackoof state. Viewing the logs shows the following: