Summary
If you have read the previous two posts in this series you will know the migration to Harvester has had its speed bumps. Most of the issues had come down to using too many experimental features and deploying the cluster in an unsupported fashion. At this point it seems it would be best to completely redo not just the physical nodes but also make tweaks to the VMs. The goal for this redeployment will be the following:
- Redeploy Rancher MCM to a separate physical cluster
- Redeploy Harvester to remaining nodes
- Build custom VM templates (without XFS)
Redeploying Rancher Cluster
Device Setup
The Rancher cluster will follow the typical deployment model for Rancher MCM. Three NUCs were setup with Rocky 9.3, and now I have the following L1/2 setup:
Rancher Cluster Setup
After all three nodes were reimaged, RKE2 v1.27.15+rke2r1 was installed via Ansible using the rancherfederal/rke2-ansible playbook. As always I default to Cilium CNI with kube-proxy replacement enabled. It is fairly easy to add some config values to RKE2 via the Ansible playbook. All examples below fall under the rke2_config
key:
Defining S3 ETCd Backups
While I do believe using an S3 backup of ETCd is overkill in my case (I mean whats the chance all three nodes become unrecoverable?) I still see value in doing so as Backblaze is free if you use less than 10 Gigabytes, and I may also end up using S3 for CSI backups later.
rke2_config:
etcd-snapshot-retention: 10
etcd-snapshot-schedule-cron: '0 */4 * * *'
etcd-s3: true
etcd-s3-endpoint: ""
etcd-s3-access-key: ""
etcd-s3-secret-key: ""
etcd-s3-bucket: ""
etcd-s3-region: ""
etcd-s3-folder: ""
Basic Security
As I am running Rocky and I do not disable SELinux the flag below is needed, also I set the kubeconfig mode to 600. The should probably be default but hey, at least its an easy fix. I did not enable any CIS profile.
rke2_config:
selinux: true
write-kubeconfig-mode: 600
Set TLS SAN
I opt for signing the TLS key with every possible SAN I may end up using, so this includes the hostname of each node, and the IP (including the kube-vip IP and hostname).
rke2_config:
tls-san:
- rancher-01.infra.lan
- 10.0.0.10
- rancher-02.infra.lan
- 10.0.0.11
- rancher-03.infra.lan
- 10.0.0.12
# Rancher cluster kube-vip
- rke2.lab.lan
- 10.0.0.29
Defining Cilium
In order for this to work there will be a period where the playbook will break. With the settings below RKE2 server will never show ready, this is because the default Cilium Helm chart needs a few setting passed to it. Run the playbook with the values below, do note Ansible will fail:
rke2_config:
cni:
- cilium
disable-kube-proxy: true
When Ansible has failed you will need to manually drop a file onto the first server, this will be a helm values chart that will resolve the issue of the server node not reaching a ready state. Place the following file in /var/lib/rancher/rke2/server/manifests/
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
kubeProxyReplacement: strict
k8sServiceHost: 127.0.0.1
k8sServicePort: 6443
cni:
chainingMode: "none"
Now re/start the rke2-server service and it should come online, when it does rerun the Ansible playbook to finish installing RKE2 to the rest of the nodes. The Ansible repository in question does have a rather large update coming that should resolve the need to manually place this file on the host.
All three nodes are master nodes with the NoSchedule
taint removed, and I configured the cluster to run the following:
- kube-vip (for k8s API HA)
- Metal-LB (for ingress HA)
- Rancher MCM
- Traefik (for ingress, this is just habit at this point)
Note: In the future I do intend on looking into more of Cilium’s features so I can cut out the need for an ingress like Traefik or NGINX, and cut out the need for a loadbalancer like Metal-LB. Cilium has a lot of features I have not yet been able to dive into but the idea of having my CNI take the role of the LB and ingress is very exciting.
Redeploying Harvester Cluster
Device Setup
All Harvester nodes were reimaged, this time with Harvester 1.3.1 which included some fixes from the previous version that I ran into. This time around I also opted to forgo using Harvester manifests. The manifest configs for Harvester are a good idea but I no longer had somewhere to pull the configs from and I opted not to upload them to GitHub. Currently the configs don’t save you that much time especially in such a small environment.
Now I have the following L1/2 setup:
It should be noted Harvester recommends the management interface run on a 10 Gig line, this is partially due to the fact that this is the interface Longhorn traffic will use. My “servers” here only have a single 10 Gig interface so this is what I have set my management interface to use, this has the drawback of all traffic flowing into a guest cluster happening over the 1 Gig lines. If I had decided to ignore this and set the management interface to the 1 Gig lines, replica rebuild times would be significantly increased.
Harvester is also responsible for promoting and demoting nodes in the cluster to master, because of this I did not set any node to a specific role like I did last time.
Harvester Cluster Addons
Unlike the previous setup only a single addon has been enabled: rancher-monitoring
, I made some tweaks to the values increasing the retention size, along with memory limits. This is highly specific to your own setup but the default values do seem to be very low and in my previous deployment I saw OOM pod errors. Below are the values I used:
Prometheus:
Key | Value |
---|---|
Retention | 30d |
Retention Size | 50GIB |
Requested CPU | 750m |
Requested Memory | 1750Mi |
CPU Limit | 2000m |
Memory Limit | 5000Mi |
Prometheus Node Exporter:
Key | Value |
---|---|
Requested CPU | 100m |
Requested Memory | 30Mi |
CPU Limit | 200m |
Memory Limit | 180Mi |
Grafana:
Key | Value |
---|---|
Requested CPU | 100m |
Requested Memory | 200Mi |
CPU Limit | 400m |
Memory Limit | 1000Mi |
Alertmanager:
Key | Value |
---|---|
Retention | 120h |
Requested CPU | 100m |
Requested Memory | 100Mi |
CPU Limit | 1000m |
Memory Limit | 600Mi |
Harvester Terraform
Harvester has a Terraform provider! I decided to use Harvesters Terraform provider where possible, it does not have 1:1 parity with features in Harvester currently but some IaC is better than none (maybe). Most of this Terraform should be built before deploying a cluster and can be done before importing the Harvester cluster to Rancher. Before moving on Harvesters Terraform code requires the Harvester kubeconfig file, so before proceeding you will need to download it from the “Support” page in the bottom left. When setting up the provider, you just need to pass the kubeconfig file location and context.
provider "harvester" {
kubeconfig = "~/.kube/harvester.yaml"
kubecontext = "local"
}
Note: If like me you are using a DNS name rather than the IP to access the Harvester UI you will need to edit the server
key in the kubeconfig. When you download the kubeconfig the server
key will be replaced with the hostname you downloaded it from rather than the IP, however, the certificate was generated without the hostname in the TLS SAN so you will get errors. So again: swap the hostname with the IP in the kubeconfig file.
SSH Keys
Adding an SSH key allows you to select your SSH key from a dropdown and pass it into a VM when creating said VM.
resource "harvester_ssh_key" "jhanafin-key" {
name = "jhanafin-key"
namespace = "harvester-public"
public_key = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDzHX5L4uTO37kSJb5u0pUpgFwXbHJJzKA/mxhMzA6ZL jhanafin@workstation.main.lan"
}
Storage Classes
I have two tiers of storage in my cluster HDD and SSD, as you can imagine SSD space is more limited than HDD so in general I tend to only deploy database workloads to SSDs unless there is a reason for a workload to need to be on SSDs. These storage classes can be referenced by guest clusters allowing you pass these into them. If you also have tierd storage make sure you label your disks, I chose “hdd”, “ssd”, and “nvme” then used the TF code below to create two storage classes to refer to these tags.
resource "harvester_storageclass" "longhorn-fast" {
name = "longhorn-fast"
allow_volume_expansion = true
is_default = false
reclaim_policy = "Delete"
volume_binding_mode = "Immediate"
volume_provisioner = "driver.longhorn.io"
parameters = {
"migratable" = "true"
"numberOfReplicas" = "3"
"staleReplicaTimeout" = "30"
"diskSelector" = "ssd,nvme"
}
}
resource "harvester_storageclass" "longhorn-slow" {
name = "longhorn-slow"
allow_volume_expansion = true
is_default = true
reclaim_policy = "Delete"
volume_binding_mode = "Immediate"
volume_provisioner = "driver.longhorn.io"
parameters = {
"migratable" = "true"
"numberOfReplicas" = "3"
"staleReplicaTimeout" = "30"
"diskSelector" = "hdd"
}
}
You will be stuck with the two storage classes longhorn
and harvester-longhorn
this isn’t a huge deal but note if like me you have multiple tiers of storage the default classes will match on either HDDs or SSDs, this means replicas will be built on either type of disk.
Networking
A network in Harvester is made up of a few different components: the cluster network, VLAN config, and network, each one depends on the last. As you can see from the terraform snippets below all nodes share a physical interface named “enp3s0” so all four nodes will use this interface. You can select all nodes (default), only specific nodes, or even match nodes based on labels. My setup below is unremarkable overall.
resource "harvester_clusternetwork" "cluster-net" {
name = "cluster-net"
}
resource "harvester_vlanconfig" "cluster-vlans" {
name = "cluster-vlan"
cluster_network_name = harvester_clusternetwork.cluster-net.name
depends_on = [
resource.harvester_clusternetwork.cluster-net
]
uplink {
nics = [
"enp3s0"
]
bond_mode = "active-backup"
bond_miimon = -1
mtu = 1500
}
}
The following two VLANs do both have DHCP servers so hypothetically I do not need to provide a route_cidr
or route_gateway
, however in my experience the DHCP mode did not appear to work. I am not well versed in what this is truly needed for but when the settings are correct the “Route Connectivity” column in Harvester will accurately show “Active”. It appears to only be used for determining if a VLAN is setup properly and that’s about it.
resource "harvester_network" "iot-vlan" {
name = "iot"
namespace = "harvester-public"
cluster_network_name = harvester_clusternetwork.cluster-net.name
depends_on = [
resource.harvester_vlanconfig.cluster-vlans,
]
vlan_id = 5
route_mode = "manual"
route_cidr = "10.0.5.1/25"
route_gateway = "10.0.5.1"
}
resource "harvester_network" "vm-vlan" {
name = "vm"
namespace = "harvester-public"
cluster_network_name = harvester_clusternetwork.cluster-net.name
depends_on = [
resource.harvester_vlanconfig.cluster-vlans,
]
vlan_id = 7
route_mode = "manual"
route_cidr = "10.0.7.1/26"
route_gateway = "10.0.7.1"
}
Harvester VM Images
If you know me personally you would know my hatred for XFS knows no bounds. I understand it has many features, and is more performant than ext4, however XFS is as useful as RAID5 BTRFS. Don’t get me wrong K8s isn’t a fan of hard power cuts either but at least it can handle it. As a result I made a very simple Packer build to resolve this. Thankfully kube-virt is just QEMU, so Packer can create QEMU VMs and afterwards they can be exported and uploaded to Harvester. It is crucial if you make your own VMs that you install a few packages:
- qemu-guest-agent
- gdisk
- cloud-utils-growpart
- cloud-init
- tar
The kickstart snippet below is needed for cloud-init
and qemu-guest-agent
both services should be enabled at this point but not started. Packer will need to reboot and SSH into the VM, Packer will use root by default for ssh hence the final line in the snippet.
%post --log=/root/post.log
dnf install epel-release -y
dnf update -y
dnf install qemu-guest-agent gdisk cloud-utils-growpart cloud-init tar -y
systemctl enable cloud-init
systemctl enable --now qemu-guest-agent
sed -i '/#PermitRootLogin*/c\PermitRootLogin yes' /etc/ssh/sshd_config
%end
Make sure to enable sshd as well:
# Services
services --enabled=sshd
After the VM is created you can import it into Harvester as an image and can now be used with cloud-init.
Deploying a Guest Cluster
The Rancher and Harvester clusters are now both setup and ready to go. At this point Rancher MCM (and its RKE2 cluster) is setup and should have very little configured other than the login. Harvester is also up, and now it is setup with:
- The rancher-monitoring addon
- SSH keys
- Custom storage classes
- Networking
- Custom VM images
Import Harvester Cluster
Importing a Harvester cluster is very easy but does require an extra step that is easy to miss. To start, login to Rancher:
- Select “Virtualization Management”
- This is in the far left column near the bottom - Select “Harvester Clusters”
- Select “Import Existing”
- Give the cluster a name
- Submit
- Copy the URL provided
Now login to Harvester:
- Go to the Advanced / Settings page of the target Harvester’s UI
- Find the “cluster-registration-url” setting and click the -> Edit Setting button
- Input the following registration URL and click the Save button
Wait a short while as Harvester and Rancher get setup. Once the Harvester cluster is imported you will need to create a cloud credential. From Rancher:
- Select “Cluster Management”
- Select “Cloud Credentials”
- Select “Create”
- Select “Harvester”
- Give the credentials a name
- Select your imported cluster from the dropdown.
At this point Harvester is imported in Rancher and we now have a cloud credential for use in deploying guest clusters, you can test by deploying via the Rancher UI.
Deploying Guest Clusters with Terraform “gotchas”
In this final section I am going to cover the “gotchas” when deploying a guest cluster via Terraform. The Rancher Terraform provider lacks one very important feature for deploying guest clusters so we will need to use more than just the Rancher provider, this is because the provider lacks the ability to create the necessary secrets in the Harvester cluster to enable the Harvester CPI on the guest cluster. Before moving on I created a project when I originally started this so I will be placing my guest clusters in projects based on their purpose, this is mostly just an organization issue for me and is not strictly needed, however this does impact the Terraform code to come later. If you made the project via terraform you can collect the id and namespace that way however the UI provides access to this info also. From the Rancher UI:
- Select “Virtualization Management”
- Select your Harvester cluster
- Select “Projects/Namespaces”
- Find your project and click the three dots on the far right
- Select “Edit YAML”
- Copy your name and namespace
-name
is undermetadata.name
-namespace
is undermetadata.namespace
Note: Rancher does not use metadata.name
as the name in the UI that’s why these are named gibberish, the name you see in the UI is spec.displayName
.
Before deploying the guest cluster we need to create the secret needed by the downstream cluster to access features of Harvester like the loadbalancer and storage. So to create the secret you will need to download the Harvester kubeconfig and use that in the Kubernetes provider config, like so:
provider kubernetes {
config_path = data.sops_file.kubernetes.data["config_path"]
config_context = data.sops_file.kubernetes.data["config_context"]
}
Now to go about making the secret for the guest cluster, and this will include creating the namespace:
resource "kubernetes_namespace_v1" "namespace" {
metadata {
name = var.namespace
labels = {
"field.cattle.io/projectId" = var.project_id
}
annotations = {
"field.cattle.io/projectId" = "${var.project_namespace}:${var.project_id}"
}
}
lifecycle {
ignore_changes = [
metadata[0].annotations
]
}
}
resource "kubernetes_service_account_v1" "k8s-sa" {
depends_on = [kubernetes_namespace_v1.namespace]
metadata {
name = var.cluster_name
namespace = var.namespace
}
}
resource "kubernetes_cluster_role_binding_v1" "k8s-sa-crb" {
depends_on = [kubernetes_service_account_v1.k8s-sa]
metadata {
name = "${kubernetes_service_account_v1.k8s-sa.metadata.0.namespace}-${kubernetes_service_account_v1.k8s-sa.metadata.0.name}"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "harvesterhci.io:csi-driver"
}
subject {
kind = "ServiceAccount"
name = kubernetes_service_account_v1.k8s-sa.metadata.0.name
namespace = kubernetes_service_account_v1.k8s-sa.metadata.0.namespace
}
}
resource "kubernetes_role_binding_v1" "k8s-sa-rb" {
depends_on = [kubernetes_service_account_v1.k8s-sa]
metadata {
name = kubernetes_service_account_v1.k8s-sa.metadata.0.name
namespace = kubernetes_service_account_v1.k8s-sa.metadata.0.namespace
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "harvesterhci.io:cloudprovider"
}
subject {
kind = "ServiceAccount"
name = kubernetes_service_account_v1.k8s-sa.metadata.0.name
namespace = kubernetes_service_account_v1.k8s-sa.metadata.0.namespace
}
}
resource "kubernetes_secret_v1" "k8s-secret" {
depends_on = [kubernetes_cluster_role_binding_v1.k8s-sa-crb]
type = "kubernetes.io/service-account-token"
wait_for_service_account_token = true
metadata {
name = var.cluster_name
namespace = kubernetes_service_account_v1.k8s-sa.metadata.0.namespace
annotations = {
"kubernetes.io/service-account.name" = kubernetes_service_account_v1.k8s-sa.metadata.0.name
}
}
This will create the secret, now we need to create “machine_selector” config:
resource "local_file" "machine_selector" {
depends_on = [ kubernetes_secret_v1.k8s-secret ]
filename = "${path.module}/kubeconfig.yaml"
content = <<-EOT
cloud-provider-name: "harvester"
cloud-provider-config: |-
apiVersion: v1
kind: Config
clusters:
- name: default
cluster:
server: ${var.harvester_url}
certificate-authority-data: ${base64encode(kubernetes_secret_v1.k8s-secret.data["ca.crt"])}
contexts:
- name: default
context:
cluster: default
namespace: ${kubernetes_service_account_v1.k8s-sa.metadata.0.namespace}
user: default
current-context: default
users:
- name: default
user:
token: ${kubernetes_secret_v1.k8s-secret.data["token"]}
EOT
}
data "local_file" "machine_selector" {
depends_on = [ local_file.machine_selector ]
filename = "${path.module}/kubeconfig.yaml"
}
There is probably a better way to do this rather than creating the file locally but it does work. This file will then need to be converted into a string and added to the machine_selector_config
key in the Rancher providers rancher2_cluster_v2
resource, the path to where the file will be dropped also needs to be provided. The snippet below is heavily abbreveated:
resource "rancher2_cluster_v2" "cluster" {
depends_on = [ data.local_file.machine_selector ]
machine_selector_config {
config = tostring(data.local_file.machine_selector.content)
}
chart_values = <<-EOT
harvester-cloud-provider:
cloudConfigPath: /var/lib/rancher/rke2/etc/config-files/cloud-provider-config
global:
cattle:
clusterName: "${var.cluster_name}"
EOT
machine_global_config = var.machine_global_config
}
}
Conclusion
After spending more time with Harvester and actually deploying it in a supported model I have grown to like it a lot. There is added complexity and overhead no doubt, when I originally assembled the hardware for my cluster I went with CPUs and RAM that made sense for a single bare metal k8s cluster with a good bit more headroom. This does mean my production cluster of 3 masters (4 CPU, 16GB RAM), and 3 workers (4 CPU, 64GB of RAM) consume roughly half of all my RAM available to me:
Someday I plan on potentially moving back to a bare-metal RKE2 cluster, the simplicity of a small cluster really is very hard to beat (especially for a homelab).