Summary
It has been about four months since I last re-deployed my homelab, and I have grown bored of my setup once again. In the previous series “Deploying Harvester” I went over the process I went through to deploy Harvester (the right way). My initial impressions of Harvester were not good but after taking a step back and redeploying with a more appropriate configuration (and buying backup batteries) everything became much more stable. So why the switch?
CSI
The CSI ran into issues with VolumeAttachmentDefinitions
failing to be deleted, so I needed to go clear them manually a couple times. I simply wanted to revert back to Longhorn by itself without this extra layer of obfuscation or abstraction on top. By deploying to a bare metal cluster I regained the ability to also choose the storage backing for longhorn.
Backups
My production cluster was handicapped by one very significant issue. At the time of this writing, Harvester’s CSI has no snapshot class. This means I had two options for backing up my cluster:
- Backup all VMs with the PVCs attached
- It is my understanding that this does work. You supposedly can backup your entire child cluster by simply backing up the VMs. This however does not work for me, I have multiple PVCs that are 2+ TBs each and they had nowhere to go, I simply had no place to offload that much data for backups.
- Run Velero FS backups
- This was the other option available to me, and is honestly almost as bad as option one. Velero is a great tool, but that is irrelevant when your PVCs don’t have a snapshot class available to them. So a restic based backup would be the only other alternative left, and when it comes to databases my experience with this is terrible. Restic based DB backups have about a 70% chance of success, and you wont know if it was a success until you try and recover, thats simply not an option.
CNI
The original cluster was also deployed with a very basic Cilium deployment. This could have been tweaked in place, but this did give me a clean slate to start from.
Conclusion
When it comes down to it, the backups were by far my biggest problem. Harvester 1.4.0 is set to come with an update to the CSI allowing for snapshots but I got bored of waiting.
Goals
For this deployment my goals are simple(?). I will be attempting to deploy a reasonably secured, Rocky 9 based bare metal cluster, that is up to date (at least RKE2 1.29), enable backups, swap Traefik for Cilium Ingress, and possibly swap MetalLB for Cilium also.
Why Cilium Ingress over Traefik IngressRoutes?
Traefik overall works rather well, swapping it out is not due to it being a bad program by any means. I have however found that in my experience slimming your cluster down to fewer moving parts is generally the way to go, theres less areas to look when you begin experiencing issues.
There is also the issue of IngressRoutes, I have also come to learn that minimizing your reliance on CRDs is a good best practice. It has certainly happened more than once where I have updated an application only to find out newer CRDs were needed, due to the fact I do not use Helm in my GitOps tooling, CRDs are not typically updated unless I extract them from a helm chart first. This is not to say I avoid CRDs, they are obviously a powerful and integral part of any cluster, but I have become more cautious when using them. I now expect a clear explanation of what the CRD offers and how it benefits me before I am willing to implement them.
Why Cilium LB over MetalLB?
Much like stated above I have come to the conclusion that it is normally a good idea to cut cruft from your clusters. If I don’t need another tool, then I shouldn’t add it. MetalLB like Traefik is not a bad program and has actually served me fairly well (better than Traefik), so MetalLB is somewhat safe from being cut from the cluster, I have not yet fully evaluated if I am interested in removing it.
Plan
- Clean up Ansible role/playbook repositories
- Manually save relevant databases/PVCs
- Deploy RKE2 to hosts
- Deploy applications (restoring PVC/s DBs where applicable)
Expected Workflow
Pre-Deploy
Cleaning up Ansible Roles/Playbooks
Previously I used Ansible to deploy RKE2 to my clusters. I will be starting with a fresh Ansible repository for this, that will be greatly simplified. This repository will be the main Ansible repo to be filled with any files, or plays I need. The structure ended up being fairly simple:
./plays
├── ansible.cfg
├── files
│ ├── audit-policy.yaml
│ ├── manifests
│ │ └── cilium.yaml
│ └── pod-security-admission-config.yaml
├── inventory
│ └── lab
│ ├── group_vars
│ │ ├── all.yml
│ │ ├── rke2_agents.yml
│ │ └── rke2_servers.yml
│ └── hosts.yml
├── LICENSE
├── plays
│ ├── install_rke2.yml
│ ├── setup_hosts.yml
│ ├── setup.yml
│ └── stig_hosts.yml
├── README.md
├── renovate.json
└── requirements.yml
Repo Breakdown
requirements.yml
I used to mirror rke2-ansible with some tweaks I made. The rke2-ansible repository has made some pretty big updates and the rewrite branch is in a much better place. I no longer feel the need to maintain my own version, I also learned a trick that I did not know existed previously, you can pull a collection from a git repo (not just Ansible Galaxy) via a requirements.yml
file like so:
collections:
- name: rancherfederal.rke2-ansible
source: git@github.com:rancherfederal/rke2-ansible.git
type: git
version: rewrite
I no longer need to maintain or mirror the upstream work, or use any nasty symlinks. One of the other goals of this deployment was to create a “reasonably” secure deployment, as such I am also going to include RedHat’s STIG role.
roles:
- RedHatOfficial.rhel9-stig
Finally, I do have some tweaks I need to ensure are made to the base OS post install. These are small so I will need to maintain a small “role”, however now “collections” are really the way to be moving forward now in Ansible. Which is great news for me if I ever want to add a new role I don’t need a whole new repo. So I made a new collection, and my requirements.yml
now looks like below:
roles:
- RedHatOfficial.rhel9-stig
collections:
- name: rancherfederal.rke2-ansible
source: git@github.com:rancherfederal/rke2-ansible.git
type: git
version: rewrite
- name: daemonslayer2048.servers
source: ssh://git@git.init6.host/lab/ansible/collections.git
type: git
version: master
This is now the final requirements.yml
and allows me to no longer worry about maintaining my own set of roles, now I only really worry about one role.
“plays” directory
The play directory makes use of the roles/collections included in the above requirements.yml
file by actually calling them. This is fairly self explanatory however I did find during my initial testing some tweaks need to be made to the STIG role by RedHat, STIGs can be disabled by simply setting a boolean, below is the stig_hosts.yml
play I used:
---
- name: RHEL 9 STIG play
hosts: all
any_errors_fatal: True
roles:
- role: RedHatOfficial.rhel9-stig
vars:
# Disable redhat-release GPG key enforcement
DISA_STIG_RHEL_09_214010: False
# Don't disable old accounts
DISA_STIG_RHEL_09_411050: False
# Disable a host of user password age limits/restrictions
DISA_STIG_RHEL_09_411010: False
DISA_STIG_RHEL_09_611075: False
DISA_STIG_RHEL_09_411015: False
DISA_STIG_RHEL_09_611080: False
# Promiscuous mode on CNI eth devices may be needed
DISA_STIG_RHEL_09_251040: False
WARNING
Much of this is probably far too bespoke to go into detail on but my intent was to hit the high level.
To actually deploy the cluster setup.yml
is called with the -k -K
flags. As the STIG playbook runs certain keys will be invalidated/change. To be clear, I am not sure why this is for certain but I assume weaker key algorithms get removed and some are regenerated. Its hard to say, the STIG playbook takes nearly 30 minutes to run and does not appear to be well written honestly.
Custom collection
Finally the setup_hosts.yml
calls my custom collection. This collection currently is a single role, the repo is broken down like so:
./collections
├── ansible.cfg
├── galaxy.yml
├── LICENSE
├── README.md
├── requirements.txt
├── requirements.yaml
├── roles
│ └── generic
│ ├── defaults
│ │ └── main.yaml
│ ├── files
│ │ └── Lab-CA.crt
│ ├── meta
│ │ └── main.yml
│ ├── molecule
│ │ └── default
│ │ ├── converge.yml
│ │ └── molecule.yml
│ ├── tasks
│ │ ├── lab_ca.yaml
│ │ ├── longhorn.yml
│ │ ├── main.yml
│ │ └── neuvector.yml
│ └── vars
└── shell.nix
This is a typical collection, the only thing worth noting here is the requirement for a galaxy.yml
file, do note, it MUST be .yml
NOT .yaml
. I learned that the hard way. The galaxy.yml
file is needed wether or not you intend on uploading your collection to Ansible Galaxy, if you intended on consuming your collection via a requirements.yml
it is necessary.
Backup Applications
As the recurring theme of this post has gone, I didn’t have backups, so before moving any further I needed to start collecting my data and getting ready for the redeployment. This section will be short as it was fairly repetitive. In order to recover from the redeployment I started by identifying what applications needed to be backed up. After identifying what needed to be backed up I would tar any volume mounts, dump any databases, and use kubectl cp
to pull the files off the cluster.
NOTE:
Through doing this I learned of a Rancher MCM limitation. If you are attempting to pull very large files off your cluster viakubectl cp
you are better off ensuring you hit the k8s API directly and not use the Rancher MCM proxy. Rancher would close the connection after a period of time and the download would fail. This only presented itself as an issue when the files began to be ~5 gigabytes or more.
Provision-Cluster
At this point my Ansible playbook repo and collection were both ready, and my applications were manually backed up and it was now time to begin re-imaging my servers with Rocky 9. I do not have any fancy or fun process for this, so I spent some time in the closet reinstalling each server. Perhaps I will look into kOps or Metal3 in the future. After the servers have been reimaged it will be up to Ansible to STIG, install RKE2, and finish some custom tasks.
Deploy-Cluster
Once the cluster had been deployed I proceeded to redeploy all the base applications, that included cert-manager, Longhorn, and Rancher MCM. Finally it was time to recover the applications. Each one followed a similar process:
- Spin up the database (if applicable), and reingest the pgsql dump
- Deploy a temporary pod to untar the backups
- Destroy the pod
- Redeploy workload
The example below is a sample temporary pod I used to setup synapse:
---
apiVersion: v1
kind: Pod
metadata:
name: setup
namespace: synapse
spec:
securityContext:
fsGroup: 991
containers:
- command:
- sleep
- "infinity"
image: debian
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: false
runAsGroup: 991
runAsNonRoot: true
runAsUser: 991
allowPrivilegeEscalation: false
seccompProfile:
type: "RuntimeDefault"
name: setup
volumeMounts:
- mountPath: /synapse/data
name: media
volumes:
- name: media
persistentVolumeClaim:
claimName: media
The cluster is finally up, however I still need backups, and to migrate from Ingressroutes to Ingress.