Summary

It has been about four months since I last re-deployed my homelab, and I have grown bored of my setup once again. In the previous series “Deploying Harvester” I went over the process I went through to deploy Harvester (the right way). My initial impressions of Harvester were not good but after taking a step back and redeploying with a more appropriate configuration (and buying backup batteries) everything became much more stable. So why the switch?

CSI

The CSI ran into issues with VolumeAttachmentDefinitions failing to be deleted, so I needed to go clear them manually a couple times. I simply wanted to revert back to Longhorn by itself without this extra layer of obfuscation or abstraction on top. By deploying to a bare metal cluster I regained the ability to also choose the storage backing for longhorn.

Backups

My production cluster was handicapped by one very significant issue. At the time of this writing, Harvester’s CSI has no snapshot class. This means I had two options for backing up my cluster:

  1. Backup all VMs with the PVCs attached
    • It is my understanding that this does work. You supposedly can backup your entire child cluster by simply backing up the VMs. This however does not work for me, I have multiple PVCs that are 2+ TBs each and they had nowhere to go, I simply had no place to offload that much data for backups.
  2. Run Velero FS backups
    • This was the other option available to me, and is honestly almost as bad as option one. Velero is a great tool, but that is irrelevant when your PVCs don’t have a snapshot class available to them. So a restic based backup would be the only other alternative left, and when it comes to databases my experience with this is terrible. Restic based DB backups have about a 70% chance of success, and you wont know if it was a success until you try and recover, thats simply not an option.

CNI

The original cluster was also deployed with a very basic Cilium deployment. This could have been tweaked in place, but this did give me a clean slate to start from.

Conclusion

When it comes down to it, the backups were by far my biggest problem. Harvester 1.4.0 is set to come with an update to the CSI allowing for snapshots but I got bored of waiting.


Goals

For this deployment my goals are simple(?). I will be attempting to deploy a reasonably secured, Rocky 9 based bare metal cluster, that is up to date (at least RKE2 1.29), enable backups, swap Traefik for Cilium Ingress, and possibly swap MetalLB for Cilium also.

Why Cilium Ingress over Traefik IngressRoutes?

Traefik overall works rather well, swapping it out is not due to it being a bad program by any means. I have however found that in my experience slimming your cluster down to fewer moving parts is generally the way to go, theres less areas to look when you begin experiencing issues.
There is also the issue of IngressRoutes, I have also come to learn that minimizing your reliance on CRDs is a good best practice. It has certainly happened more than once where I have updated an application only to find out newer CRDs were needed, due to the fact I do not use Helm in my GitOps tooling, CRDs are not typically updated unless I extract them from a helm chart first. This is not to say I avoid CRDs, they are obviously a powerful and integral part of any cluster, but I have become more cautious when using them. I now expect a clear explanation of what the CRD offers and how it benefits me before I am willing to implement them.

Why Cilium LB over MetalLB?

Much like stated above I have come to the conclusion that it is normally a good idea to cut cruft from your clusters. If I don’t need another tool, then I shouldn’t add it. MetalLB like Traefik is not a bad program and has actually served me fairly well (better than Traefik), so MetalLB is somewhat safe from being cut from the cluster, I have not yet fully evaluated if I am interested in removing it.


Plan

  1. Clean up Ansible role/playbook repositories
  2. Manually save relevant databases/PVCs
  3. Deploy RKE2 to hosts
  4. Deploy applications (restoring PVC/s DBs where applicable)

Expected Workflow

flowchart LR subgraph Pre-Deploy direction TB IDA[Identify Applications] --> BAA[Backup Application DB] BAA[Backup Application DB] --> BAP[Backup Application PVC] end subgraph Provision-Cluster direction TB subgraph Ansible direction TB CallAnsibleSetup[Call Ansible Setup] --> STIG(STIG Hosts) STIG(STIG Hosts) --> InstallRKE2(Install RKE2) InstallRKE2(Install RKE2) --> SetupHosts(Setup Hosts) end ImageServers[Image Servers] --> Ansible end subgraph Deploy-Cluster direction TB subgraph Kustomize direction TB DeployCert-Manager[Deploy cert-manager] --> DeployLonghorn[DeployLonghorn] DeployLonghorn[DeployLonghorn] --> RecoverApplications[Recover Applications] RecoverApplications[Recover Applications] --> MigrateToIngress[Migrate to Cillium Ingress] end end Pre-Deploy --> Provision-Cluster --> Deploy-Cluster

Pre-Deploy

Cleaning up Ansible Roles/Playbooks

Previously I used Ansible to deploy RKE2 to my clusters. I will be starting with a fresh Ansible repository for this, that will be greatly simplified. This repository will be the main Ansible repo to be filled with any files, or plays I need. The structure ended up being fairly simple:

./plays
├── ansible.cfg
├── files
│   ├── audit-policy.yaml
│   ├── manifests
│   │   └── cilium.yaml
│   └── pod-security-admission-config.yaml
├── inventory
│   └── lab
│       ├── group_vars
│       │   ├── all.yml
│       │   ├── rke2_agents.yml
│       │   └── rke2_servers.yml
│       └── hosts.yml
├── LICENSE
├── plays
│   ├── install_rke2.yml
│   ├── setup_hosts.yml
│   ├── setup.yml
│   └── stig_hosts.yml
├── README.md
├── renovate.json
└── requirements.yml

Repo Breakdown

requirements.yml

I used to mirror rke2-ansible with some tweaks I made. The rke2-ansible repository has made some pretty big updates and the rewrite branch is in a much better place. I no longer feel the need to maintain my own version, I also learned a trick that I did not know existed previously, you can pull a collection from a git repo (not just Ansible Galaxy) via a requirements.yml file like so:

collections:
  - name: rancherfederal.rke2-ansible
    source: git@github.com:rancherfederal/rke2-ansible.git
    type: git
    version: rewrite

I no longer need to maintain or mirror the upstream work, or use any nasty symlinks. One of the other goals of this deployment was to create a “reasonably” secure deployment, as such I am also going to include RedHat’s STIG role.

roles: 
  - RedHatOfficial.rhel9-stig

Finally, I do have some tweaks I need to ensure are made to the base OS post install. These are small so I will need to maintain a small “role”, however now “collections” are really the way to be moving forward now in Ansible. Which is great news for me if I ever want to add a new role I don’t need a whole new repo. So I made a new collection, and my requirements.yml now looks like below:

roles: 
  - RedHatOfficial.rhel9-stig

collections:
  - name: rancherfederal.rke2-ansible
    source: git@github.com:rancherfederal/rke2-ansible.git
    type: git
    version: rewrite
  - name: daemonslayer2048.servers
    source: ssh://git@git.init6.host/lab/ansible/collections.git
    type: git
    version: master

This is now the final requirements.yml and allows me to no longer worry about maintaining my own set of roles, now I only really worry about one role.

“plays” directory

The play directory makes use of the roles/collections included in the above requirements.yml file by actually calling them. This is fairly self explanatory however I did find during my initial testing some tweaks need to be made to the STIG role by RedHat, STIGs can be disabled by simply setting a boolean, below is the stig_hosts.yml play I used:

---
- name: RHEL 9 STIG play
  hosts: all
  any_errors_fatal: True
  roles:
    - role: RedHatOfficial.rhel9-stig
  vars:
    # Disable redhat-release GPG key enforcement
    DISA_STIG_RHEL_09_214010: False
    # Don't disable old accounts
    DISA_STIG_RHEL_09_411050: False
    # Disable a host of user password age limits/restrictions
    DISA_STIG_RHEL_09_411010: False
    DISA_STIG_RHEL_09_611075: False
    DISA_STIG_RHEL_09_411015: False
    DISA_STIG_RHEL_09_611080: False
    # Promiscuous mode on CNI eth devices may be needed
    DISA_STIG_RHEL_09_251040: False

WARNING
Much of this is probably far too bespoke to go into detail on but my intent was to hit the high level.

To actually deploy the cluster setup.yml is called with the -k -K flags. As the STIG playbook runs certain keys will be invalidated/change. To be clear, I am not sure why this is for certain but I assume weaker key algorithms get removed and some are regenerated. Its hard to say, the STIG playbook takes nearly 30 minutes to run and does not appear to be well written honestly.

Custom collection

Finally the setup_hosts.yml calls my custom collection. This collection currently is a single role, the repo is broken down like so:

./collections
├── ansible.cfg
├── galaxy.yml
├── LICENSE
├── README.md
├── requirements.txt
├── requirements.yaml
├── roles
│   └── generic
│       ├── defaults
│       │   └── main.yaml
│       ├── files
│       │   └── Lab-CA.crt
│       ├── meta
│       │   └── main.yml
│       ├── molecule
│       │   └── default
│       │       ├── converge.yml
│       │       └── molecule.yml
│       ├── tasks
│       │   ├── lab_ca.yaml
│       │   ├── longhorn.yml
│       │   ├── main.yml
│       │   └── neuvector.yml
│       └── vars
└── shell.nix

This is a typical collection, the only thing worth noting here is the requirement for a galaxy.yml file, do note, it MUST be .yml NOT .yaml. I learned that the hard way. The galaxy.yml file is needed wether or not you intend on uploading your collection to Ansible Galaxy, if you intended on consuming your collection via a requirements.yml it is necessary.

Backup Applications

As the recurring theme of this post has gone, I didn’t have backups, so before moving any further I needed to start collecting my data and getting ready for the redeployment. This section will be short as it was fairly repetitive. In order to recover from the redeployment I started by identifying what applications needed to be backed up. After identifying what needed to be backed up I would tar any volume mounts, dump any databases, and use kubectl cp to pull the files off the cluster.

NOTE:
Through doing this I learned of a Rancher MCM limitation. If you are attempting to pull very large files off your cluster via kubectl cp you are better off ensuring you hit the k8s API directly and not use the Rancher MCM proxy. Rancher would close the connection after a period of time and the download would fail. This only presented itself as an issue when the files began to be ~5 gigabytes or more.


Provision-Cluster

At this point my Ansible playbook repo and collection were both ready, and my applications were manually backed up and it was now time to begin re-imaging my servers with Rocky 9. I do not have any fancy or fun process for this, so I spent some time in the closet reinstalling each server. Perhaps I will look into kOps or Metal3 in the future. After the servers have been reimaged it will be up to Ansible to STIG, install RKE2, and finish some custom tasks.


Deploy-Cluster

Once the cluster had been deployed I proceeded to redeploy all the base applications, that included cert-manager, Longhorn, and Rancher MCM. Finally it was time to recover the applications. Each one followed a similar process:

  1. Spin up the database (if applicable), and reingest the pgsql dump
  2. Deploy a temporary pod to untar the backups
  3. Destroy the pod
  4. Redeploy workload

The example below is a sample temporary pod I used to setup synapse:

---
apiVersion: v1
kind: Pod
metadata:
  name: setup
  namespace: synapse
spec:  
  securityContext:
    fsGroup: 991
  containers:
    - command:
        - sleep
        - "infinity"
      image: debian
      securityContext:
        capabilities:
          drop:
          - ALL
        readOnlyRootFilesystem: false
        runAsGroup: 991
        runAsNonRoot: true
        runAsUser: 991
        allowPrivilegeEscalation: false
        seccompProfile:
          type: "RuntimeDefault"
      name: setup
      volumeMounts:
        - mountPath: /synapse/data
          name: media
  volumes:
    - name: media
      persistentVolumeClaim:
        claimName: media

The cluster is finally up, however I still need backups, and to migrate from Ingressroutes to Ingress.