Deploying Harvester (Round 2): The Proxmox Dilemma
Series: Deploying Harvester (Round 2)
The Proxmox Dilemma
As much as I love Proxmox, if it is anything, it is basic. This is a double edge sword of course, it is rock solid and stable (foreshadowing), and it is also exceptionally basic with few features I want, especially as a Kubernetes user.
I had been planning to move from Proxmox to Harvester (again) for a little while, however this was accelerated by a bug1 I happened across when upgrading from v8 to v9. The Ceph cluster became unstable post upgrade, with the manager services seg faulting in a continuous loop. Turns out this was an issue with the python interpreter2 and before the solution became available my Ceph cluster toppled over and became totally unresponsive, and I certainly played a role in the final failure (whoops). At this time I decided to simply tear down the Proxmox cluster as a whole and begin setting Harvester up.
The first go around with Harvester was less than an ideal experience, however Harvester was/is a very fast moving project and has made loads of progress. So I have decided to jump back into it and give it another shot. This time around there are a few critical changes I will make to the deployment to ease my primary issues:
- I will be using Rook/Ceph for guest cluster storage
- This will enable CoW snapshot mounting support, though this should come to Longhorn v2. Longhorn can take zero-write snapshots however any attempt to mount said snapshot even as ROX (Read Only Many) causes a full copy to be built.
- The second issue with the use of Longhorn is that no volume can exceed the size of a disk. Example: I have a 10TB PVC, the largest HDD I have in use is 8TB. I would need to find a way to split my PVC into smaller chunks to get my workload working If I stuck with Longhorn.
- I will not repeat the mistake of using the Rancher vCluster addon
- I want more control over the Rancher MCM cluster than what I can get with the vCluster addon.
With the recent improvements to Harvester most of my gripes (nearly all storage related) should be resolvable.
Before continuing, technically multiple project names in this article have changed:
- Harvester has been renamed to SUSE Virtualization
- Longhorn has been renamed to SUSE Storage
The GitHub projects at this time are still named Harvester, and Longhorn so to avoid confusion those are the names that will be used here.
Hardware Setup
To make my plans work I did need to beef up the NVME storage available to each node. I swapped the old 250GB NVMEs to 4TB NVMEs. Each host is identical:
Host | CPU | RAM | HDDs | SSDs | NICs |
---|---|---|---|---|---|
harvester-01 | i5-10400 | 128GB | 3 x 8TB | 1 x 4TB, 1 x 1TB | 1 x 1Gb, 1 x 10Gb |
harvester-02 | i5-10400 | 128GB | 3 x 8TB | 1 x 4TB, 1 x 1TB | 1 x 1Gb, 1 x 10Gb |
harvester-03 | i5-10400 | 128GB | 3 x 8TB | 1 x 4TB, 1 x 1TB | 1 x 1Gb, 1 x 10Gb |
harvester-04 | i5-10400 | 128GB | 3 x 8TB | 1 x 4TB, 1 x 1TB | 1 x 1Gb, 1 x 10Gb |
This was a bit overkill, but in my experience I always end up sizing my hardware exactly for what I need in the moment. That sounds great, until I eventually want to test something new, and my needs change, and I eventually end up with no headroom to experiment. In fact: sizing my lab to exactly what I need has probably cost me far more than if I had simply overbuilt. Other than the NVME SSD storage bump no other changes were made to the hardware.
Deploying Harvester
The deployment process unlike in the past I kept simple this time. I did not attempt to use that Harvester remote config, I only have four nodes so this is simply overkill, all of the initial configuration (IPs, hostnames, DNS, etc) were all done manually and no additional settings were changed via the config (like Longhorn reservations). If you don’t know what I am referring to, the docs are here.
It is important to note for the networking I used the 10Gb interfaces as the mgmt
network, this is essential to the deployment of Ceph in the following sections. Rook/Ceph will be deployed on the mgmt
network so that is where the bulk of all data traffic will exist. This leaves the 1Gb interfaces for the VM traffic, realistically traffic into any VM cluster will never exceed 1Gb, but replication and recovery traffic will, I will also want access to the Ceph cluster from my desktop so this prevents me from possibly saturating the line when doing so. All that to say, a 10Gb line will go to Harvester/Ceph/Harbor, and a 1Gb line will go to all other VMs.
There is another caveat to this deployment. All workloads deployed on Harvester will use Longhorn. This means all VM root disks, the Harvester monitoring addon, and Harbor. This is to ensure all VMs have the fastest storage available to them for OS operations, and to ensure applications like Harbor are not dependant on Ceph. It also simply makes sense to monitor Ceph performance from the monitoring addon, using a different underlying storage. Ceph in smaller clusters, especially on HDDs is not particularly fast, so OS disks running ETCd alongside guest cluster workloads is not a wise idea.
Harvester Services
Some service will be deployed on the Harvester cluster itself, rather than attempting any funny business with device passthrough to VMs. The goal is to keep things simple and light where possible. There will be two core services, as discussed Rook/Ceph, and Harbor.
Deploying Harbor
There will be nothing too special about this deployment. I often get rate limited by Docker Hub (mostly due to Renovate), so to help alleviate this, I deployed Harbor to the Harvester cluster itself. My power is finicky so its not uncommon for the rack to lose power, if Harbor is available first the guest clusters can pull from it and hopefully reduce the chance I get rate limited.
Deploying Rook/Ceph
Deploying Rook/Ceph requires a bit more work, but not too much. The Harvester OS is based on SUSE Linux Enterprise Micro3 (a.k.a SLE Micro), this is an immutable operating system so the host OS can not be changed (this include most of the filesystem) without some configuration first. This matters, as Rook drops binaries to the host filesystem, and multiple kernel modules need to be added to the system. To do so you need to edit 90_custom.yaml
in /oem
4 to add the following:
- The following paths
/var/lib/rook
/var/lib/ceph
/etc/ceph
/etc/webhook
need to be added to.stages.rootfs[0].environment.PERSISTENT_STATE_PATHS
- And the following commands
modprobe rbd
modprobe nbd
need to be added to.stages.initramfs.[0].commands
After these changes are made the host needs to be rebooted, and post reboot the node is ready, this must be completed for every node in the cluster. These both can be included in the configuration file that I chose to skip5.
Before deploying Rook I made sure to consume my 4TB NVME SSDs with Longhorn. This was not strictly needed as the setting I will use for Rook will consume all devices starting with “sd”, so the NVME devices would not be consumed. A little extra insurance never hurt anyone though, and Rook will check each device to make sure they are not in use already. Again though, its worth it to have two separate conditions to prevent Ceph from eating a disk it should not. I will skip my full config but the two key variables needed to accomplish my deployment are cephClusterSpec.network.provider
and cephClusterSpec.storage
, see below:
cephClusterSpec:
network:
provider: host
storage:
useAllNodes: true
useAllDevices: false
deviceFilter: "^sd."
Setting the provider to host
will expose Ceph on the Harvester mgmt
network, that means both the monitor and OSD services will share the same IPs as their respective Harvester host. If you have a firewall between guest clusters and the Harvester cluster (and you should) you will need to allow TCP/3789 for the monitor service, and TCP/6800-7300 for the OSD service. This will most likely be the bulk of traffic, so it would be worth while to place these rules earlier in your rule list if possible, with the OSD service first.
A Little Terraform Makes The Toil Go Away
At this point you could continue making manual changes, but we can automate the toil away instead:
resource "harvester_clusternetwork" "lab-cluster-net" {
name = "lab"
description = "Default cluster network"
}
resource "harvester_vlanconfig" "lab-vlanconfig" {
depends_on = [
harvester_clusternetwork.lab-cluster-net
]
name = "lab-vlanconfig"
cluster_network_name = harvester_clusternetwork.lab-cluster-net.name
uplink {
bond_mode = "active-backup"
bond_miimon = -1
nics = [
"enp3s0"
]
}
}
With this we can create a cluster network, and a vlanconfig. I personally find the naming of these resources a bit confusing, but this is not where a VLAN gets set that is set in the harvester_network
resource7. We can also add a public SSH key, and download a Rocky 9.6 cloud image and ISO:
resource "harvester_ssh_key" "jhanafin-ed25519" {
name = "jhanafin-ed25519"
namespace = "harvester-public"
public_key = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDzHX5L4uTO37kSJb5u0pUpgFwXbHJJzKA/mxhMzA6ZL jhanafin@workstation.main.lan"
}
resource "harvester_image" "rocky-96-cloud" {
name = "rocky-9.6-cloud"
namespace = "harvester-public"
storage_class_name = "harvester-longhorn"
display_name = "Rocky 9.6 Cloud"
source_type = "download"
url = "https://dl.rockylinux.org/pub/rocky/9/images/x86_64/Rocky-9-GenericCloud-Base.latest.x86_64.qcow2"
}
resource "harvester_image" "rocky-96-dvd" {
name = "rocky-9.6-dvd"
namespace = "harvester-public"
storage_class_name = "harvester-longhorn"
display_name = "Rocky 9.6 DVD"
source_type = "download"
url = "https://download.rockylinux.org/pub/rocky/9/isos/x86_64/Rocky-9.6-x86_64-dvd.iso"
}
Chances are you will need/want to do more than this to setup your cluster (like setting up a backup target6). At this point Harvester is minimally setup, I now have Rook/Ceph, Harbor, and some basics to begin setting up my Rancher cluster.
Deploying Rancher MCM
The Harvester Terraform provider does not have a way to create namespaces, so the Kubernetes provider needs to be used to do so. I place all resources related to a guest cluster in its own namespace, that means the Rancher VLAN will rely on the Kubernetes provider, like so:
resource "kubernetes_namespace" "rancher" {
metadata {
name = "rancher"
annotations = {
name = "rancher"
}
}
lifecycle {
ignore_changes = [
metadata
]
}
}
resource "harvester_network" "rancher-vlan" {
depends_on = [
harvester_vlanconfig.lab-vlanconfig,
kubernetes_namespace.rancher
]
cluster_network_name = harvester_clusternetwork.lab-cluster-net.name
namespace = kubernetes_namespace.rancher.id
name = "rancher"
vlan_id = 11
route_mode = "manual"
route_cidr = "10.10.11.61/26"
route_gateway = "10.10.11.62"
}
Typically I would have automated everything but I was in a rush, so I installed three VMs manually to make my Rancher cluster (gross I know), but Ansible will solve most of the toil post VM install. So deploying three VMs is as simple as:
resource "harvester_virtualmachine" "rancher" {
depends_on = [
harvester_network.rancher-vlan,
harvester_image.rocky-96-dvd
]
namespace = kubernetes_namespace.rancher.id
name = format("server-%02d", count.index + 1)
count = 3
description = "Rancher MCM cluster"
cpu = 4
memory = "8Gi"
efi = true
secure_boot = false
network_interface {
name = "nic-1"
network_name = harvester_network.rancher-vlan.id
}
disk {
name = "rootdisk"
type = "disk"
size = "150Gi"
bus = "virtio"
boot_order = 1
}
input {
name = "tablet"
type = "tablet"
bus = "usb"
}
}
From this point I installed the VMs manually, set static IPs, and prepared my Ansible deployment. There is nothing too crazy about this install so I wont go too in depth. I am basically a daily user of the rke2-ansible repository so as always that is how I decided to setup my Rancher cluster, along with a custom role I use to populate VMs with useful tools and my internal CA cert. I wont go to deep into this as the docs in the repository explain most of it all. The Rancher cluster deployed has a pretty basic set of applications deployed:
Takeaways
After deploying Harvester and Rancher I proceeded to begin deploying my guest clusters via Terraform, however that is to be saved for a later post, this was long enough.
Sources
-
Ceph Managers Seg Faulting Post Upgrade
-
Ceph Seg Fault Solution
-
Harvester Architecture
-
Cloud-Native Node Configuration
-
Harvester persistent state paths
-
Harvester Backup Target Example
-
harvester network Resource