Jan. 25, 2021
Kyle Kaniecki
The blog series are living documents, so I will be updating the specific articles or adding new ones as I continue to fine tune my home lab cluster.
The code for all of these articles in located on my Gitlab.
Holy vacation, Batman! I've been gone for a while, and haven't been as active on my blog lately, but I promise that it was for a good reason. I've finally started my homelab, and have some really exciting content to share with the void.
As mentioned in my Running bitwarden_rs on Kubernetes blog post, I planned on running my own kubernetes cluster at home. Back in my junior year of college, I happened to land on a project with a peer of mine named Jason (Jason, if you're reading this, thank you!). He happened to have an old server chassis laying around that he was willing to part ways with. The server chassis was a Chenbro RM21706, but had no drives, no motherboard or compute components, and only included the RAID card, chassis fans, and power supply. However, it was the beginning of a server, and I was determined to have it be my server. Quickly, I found an ATX-E motherboard on Facebook marketplace with 2 Quad Core Xeon processors, their cooling blocks, and 32GB of DDR2 RAM (yes, you read that right). However, the motherboard only cost me a whopping $25, so I couldn't pass up the chance to boot up this server and get it running. After buying it from the seller, rushing home and ripping the top panel off of the chassis, I was on my way to starting this thing up.
However, I quickly found out that the old motherboard I had bought was LOUD, like really loud. When I turned the thing on, even in our storage closet, it sounded like a vacuum cleaner was running in the other room. The old Intel CPU cooler ran at super high RPMs, and I didn't want to mess with the BIOS to get the noise level down when I could invest in something a little more modern. This would also allowed me to keep the power costs down, as I was going to be running this in my home 24/7. The Chenbro was fun to play around with, but for running a 24/7 server with home utilities, it was just too loud and power hungry. So instead, I decided to invest around $400 in a new (to me) Dell r620. This little 1U server was not only much more power efficient, but it also was much quieter than the old Chenbro, even being a 1U. The specs of the machine are as follows:
I threw the thing in my server box and booted it up. Looking at my Kill-A-Watt, the new R620 would cost me around $12/month to run 24/7. Great, much cheaper than the $50 I was spending on a minimal kubernetes cluster in the cloud.
First, I had to install Proxmox on the servers. I decided to go with Proxmox because it is really just a modified version of Debian under the hood, and for my home environment, I would rather have something I am familiar with instead of something that is "more secure." This kind of led me to stray away from ESXi and other platforms, which instead have custom kernels for virtualization. After I grabbed the latest Proxmox image, I dd'd the ISO to a space USB drive I had laying around to create a bootable disk.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 232.9G 0 disk
\u251c\u2500sda1 8:1 0 16M 0 part
\u2514\u2500sda2 8:2 0 232.9G 0 part
...
$ sudo dd if=/path/to.iso of=/dev/sdX bs=1M status=progress
After the dd
command finished, I was able to pull the USB out of my main desktop and plug it into the front of the R620. From here, I was able to configure the BIOS, IPMI server, and raid card. The BIOS already came with pretty sane defaults, so I left that as the default. The fans were quiet enough that I didn't feel the need to modify the fan speeds in the BIOS, but it is available if needed.
Once I got to the raid card configuration though, I was confronted with 2 different paths:
Guess which one I picked initially, and guess which one I needed?
Go with the many raid 0 disks configuration if you're using ZFS storage or Rook/Ceph. They don't play well with a hardware raid controller.
So after configurating the firmware of the machine, I picked the USB drive as my boot device and started up the proxmox installer. The Proxmox Installation Guide is very good at describing all the steps, and the installer itself is very easy to use, so I won't reiterate that here.
Once I had Proxmox booted on my R620, it was time to create a few VMs to start a kubernetes cluster. But I was faced with another decision: I wanted to use proxmox to simulate a larger cluster than I realistically had (10 kubernetes nodes instead of just the R620), but I also wanted to make sure that I utilized as much of the server's resources as possible for pods, not underlying VM operating system bloat. So during my researching on the internet, I found that I had a few options that would allow me to minimize the operating system bloat, while also minimizing the amount of maintainence I would need to do to the underlying operating system. They are:
For each of the operating systems, I'll go over a quick pros/cons list.
Tl;DR: I chose Talos
After considering my options, I decided to go with Talos and continue on my homelab journey. Someday, I will probably get a k3OS iso and play around with it, but for now Talos is actually doing me quite well. If a machine goes down, I can simply reboot it and any error state it was in will be wiped away clean.
Talos is super easy to get going with it's basic configuration, but I found that I needed to modify the default configuration quite a bit in order to get the nodes in a state that I wanted. For example, by default, the Talos machines boot into DHCP mode, which is super nice when trying to setup up the machine, but not great afterwards when you want a node to keep the address you give it. Instead, I wanted to give it a static IP address, give the machines a local DNS server on the network which would act as a cache for lower latency, and also give the machines a local time server in order to keep all my Ceph daemons in check (Again, this is breaking the third wall a bit, but if you're curious about my Ceph adventures, take a look at the ceph article in this series). This was all very possible in Talos, but admittedly their documentation was a little lacking. And since Talos is so new, online resources were lacking a lot as well. I plan on opening a few PRs on the Talos repo to improve their docs a bit, but for now I will just share my own configuration and hopefully it will help others. My full configuration is below, with some pieces pulled out:
version: v1alpha1 # Indicates the schema used to decode the contents.
debug: false # Enable verbose logging to the console.
persist: true # Indicates whether to pull the machine config upon every boot.
# Provides machine specific configuration options.
machine:
type: init # Defines the role of the machine within the cluster.
token: uyz434.uil3defrvkudkb8y # The `token` is used by a machine to join the PKI of the cluster.
# The root certificate authority of the PKI.
ca:
crt: <cert>
key: <key>
# Used to provide additional options to the kubelet.
kubelet: {}
# # The `image` field is an optional reference to an alternative kubelet image.
# image: ghcr.io/talos-systems/kubelet:v1.20.1
# # The `extraArgs` field is used to provide additional flags to the kubelet.
# extraArgs:
# key: value
# # The `extraMounts` field is used to add additional mounts to the kubelet container.
# extraMounts:
# - destination: /var/lib/example
# type: bind
# source: /var/lib/example
# options:
# - rshared
# - rw
# Provides machine specific network configuration options.
network:
# `interfaces` is used to define the network interface configuration.
interfaces:
- interface: eth0 # The interface name.
cidr: 10.10.100.0/24 # Assigns a static IP address to the interface.
# A list of routes associated with the interface.
routes:
- network: 0.0.0.0/0 # The route's network.
gateway: 10.10.100.1 # The route's gateway.
metric: 1024 # The optional metric for the route.
mtu: 1500 # The interface's MTU.
# # Bond specific options.
# bond:
# # The interfaces that make up the bond.
# interfaces:
# - eth0
# - eth1
# mode: 802.3ad # A bond option.
# lacpRate: fast # A bond option.
# # Indicates if DHCP should be used to configure the interface.
# dhcp: true
# # DHCP specific options.
# dhcpOptions:
# routeMetric: 1024 # The priority of all routes received via DHCP.
# Used to statically set the nameservers for the machine.
nameservers:
- 10.10.100.2
- 10.10.100.1
# # Allows for extra entries to be added to the `/etc/hosts` file
# extraHostEntries:
# - ip: 192.168.1.100 # The IP of the host.
# # The host alias.
# aliases:
# - example
# - example.domain.tld
# Used to provide instructions for installations.
install:
disk: /dev/sda # The disk used for installations.
image: ghcr.io/talos-systems/installer:v0.8.4 # Allows for supplying the image used to perform the installation.
bootloader: true # Indicates if a bootloader should be installed.
wipe: false # Indicates if the installation disk should be wiped at installation time.
# # Allows for supplying extra kernel args via the bootloader.
# extraKernelArgs:
# - talos.platform=metal
# - reboot=k
# # Extra certificate subject alternative names for the machine's certificate.
# # Uncomment this to enable SANs.
# certSANs:
# - 10.0.0.10
# - 172.16.0.10
# - 192.168.0.10
# # Used to partition, format and mount additional disks.
# # MachineDisks list example.
# disks:
# - device: /dev/sdb # The name of the disk to use.
# # A list of partitions to create on the disk.
# partitions:
# - mountpoint: /var/mnt/extra # Where to mount the partition.
#
# # # This size of partition: either bytes or human readable representation.
# # # Human readable representation.
# # size: 100 MB
# # # Precise value in bytes.
# # size: 1073741824
# # Allows the addition of user specified files.
# # MachineFiles usage example.
# files:
# - content: '...' # The contents of the file.
# permissions: 0o666 # The file's permissions in octal.
# path: /tmp/file.txt # The path of the file.
# op: append # The operation to use
# # The `env` field allows for the addition of environment variables.
# # Environment variables definition examples.
# env:
# GRPC_GO_LOG_SEVERITY_LEVEL: info
# GRPC_GO_LOG_VERBOSITY_LEVEL: "99"
# https_proxy: http://SERVER:PORT/
# env:
# GRPC_GO_LOG_SEVERITY_LEVEL: error
# https_proxy: https://USERNAME:PASSWORD@SERVER:PORT/
# env:
# https_proxy: http://DOMAIN\USERNAME:PASSWORD@SERVER:PORT/
# # Used to configure the machine's time settings.
# # Example configuration for cloudflare ntp server.
time:
disabled: false # Indicates if the time service is disabled for the machine.
# Specifies time (NTP) servers to use for setting the system time.
servers:
- time.cloudflare.com
# # Used to configure the machine's sysctls.
# # MachineSysctls usage example.
# sysctls:
# kernel.domainname: talos.dev
# net.ipv4.ip_forward: "0"
# # Used to configure the machine's container image registry mirrors.
# registries:
# # Specifies mirror configuration for each registry.
# mirrors:
# ghcr.io:
# # List of endpoints (URLs) for registry mirrors to use.
# endpoints:
# - https://registry.insecure
# - https://ghcr.io/v2/
# # Specifies TLS & auth configuration for HTTPS image registries.
# config:
# registry.insecure:
# # The TLS configuration for the registry.
# tls:
# insecureSkipVerify: true # Skip TLS server certificate verification (not recommended).
#
# # # Enable mutual TLS authentication with the registry.
# # clientIdentity:
# # crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
# # key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
#
# # # The auth configuration for this registry.
# # auth:
# # username: username # Optional registry authentication.
# # password: password # Optional registry authentication.
# Provides cluster specific configuration options.
cluster:
# Provides control plane specific configuration options.
controlPlane:
endpoint: https://blackbear-cluster.local:6443 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.
clusterName: blackbear-cluster # Configures the cluster's name.
# Provides cluster specific network configuration options.
network:
dnsDomain: cluster.local # The domain used by Kubernetes DNS.
# The pod subnet CIDR.
podSubnets:
- 10.244.0.0/16
# The service subnet CIDR.
serviceSubnets:
- 10.96.0.0/12
# # The CNI used.
# cni:
# name: custom # Name of CNI to use.
# # URLs containing manifests to apply for the CNI.
# urls:
# - https://raw.githubusercontent.com/cilium/cilium/v1.8/install/kubernetes/quick-install.yaml
token: <token> # The [bootstrap token](https://kubernetes.io/docs/reference/access-authn-authz/bootstrap-tokens/) used to join the cluster.
aescbcEncryptionSecret: <key> # The key used for the [encryption of secret data at rest](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/).
# The base64 encoded root certificate authority used by Kubernetes.
ca:
crt: <certificate>
key: <key>
# API server specific configuration options.
apiServer:
# Extra certificate subject alternative names for the API server's certificate.
certSANs:
- blackbear-cluster.pihole
# # The container image used in the API server manifest.
# image: k8s.gcr.io/kube-apiserver-amd64:v1.20.1
# Controller manager server specific configuration options.
controllerManager: {}
# # The container image used in the controller manager manifest.
# image: k8s.gcr.io/kube-controller-manager-amd64:v1.20.1
# Kube-proxy server-specific configuration options
proxy: {}
# # The container image used in the kube-proxy manifest.
# image: k8s.gcr.io/kube-proxy-amd64:v1.20.1
# Scheduler server specific configuration options.
scheduler: {}
# # The container image used in the scheduler manifest.
# image: k8s.gcr.io/kube-scheduler-amd64:v1.20.1
# Etcd specific configuration options.
etcd:
# The `ca` is the root certificate authority of the PKI.
ca:
crt: <openssl_cert>
key: <base64_gen_key>
# # The container image used to create the etcd service.
# image: gcr.io/etcd-development/etcd:v3.4.14
# # Pod Checkpointer specific configuration options.
# podCheckpointer:
# image: '...' # The `image` field is an override to the default pod-checkpointer image.
# # Core DNS specific configuration options.
# coreDNS:
# image: k8s.gcr.io/coredns:1.7.0 # The `image` field is an override to the default coredns image.
# # A list of urls that point to additional manifests.
# extraManifests:
# - https://www.example.com/manifest1.yaml
# - https://www.example.com/manifest2.yaml
# # A map of key value pairs that will be added while fetching the ExtraManifests.
# extraManifestHeaders:
# Token: "1234567"
# X-ExtraInfo: info
# # Settings for admin kubeconfig generation.
# adminKubeconfig:
# certLifetime: 1h0m0s # Admin kubeconfig certificate lifetime (default is 1 year).
There are a few really important bits in the configuration here that are critical for a more production ready kubernetes cluster:
clusterendpoint
to a DNS resolvable name instead of a hard coded IP address, which allows for a load balanced, HA API layerAfter the configuration has been modified, I used the talosctl
cli tool to push the configuration to the node and start the bootstrap process. Here, I will give the Talos devs a shoutout and say that the CLI tool really makes configuring these machines easy. Kudos!
Once the cluster init machine was booted and in the ready state, I added a few more master nodes using the controlplane.yml
file that was also generated, as well as 7 worker nodes. My final cluster for my homelab looked like this: Kubernetes README.
I made the same changes to the controlplane.yml
and join.yml
files that were generate with the talosctl command. This ensured that all nodes, both master and worker, would have consistent settings across the cluster.
So now that we've told our machines to use our DNS name as a cluster endpoint, I needed to figure out how I would resolve that hostname, and if it would do any kind of load balancing. I was using a dnsmasq lxc container to resolve dns queries for the cluster, so I was able to set the cluster endpoint to one ip address rather easily, however I wanted to have the API request load split evenly across all master nodes that were in the cluster. Dnsmasq could do this for me, but I ultimately decided to use nginx, as I was more familiar with it.
In a new lxc container, I installed nginx on a debian-slim instance. To do this, I ran the following commands:
root@kube-nginx-proxy:~# apt-get install nginx
...
root@kube-nginx-proxy:~# ls /etc/nginx/
conf.d fastcgi.conf fastcgi_params koi-utf koi-win mime.types modules-available modules-enabled nginx.conf proxy_params scgi_params sites-available sites-enabled snippets uwsgi_params win-utf
root@kube-nginx-proxy:~# systemctl status nginx.service
* nginx.service - A high performance web server and a reverse proxy server
Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-03-04 16:49:46 UTC; 1 weeks 6 days ago
Docs: man:nginx(8)
Process: 10140 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
Process: 10142 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
Main PID: 10143 (nginx)
Tasks: 3 (limit: 4915)
Memory: 5.8M
CGroup: /system.slice/nginx.service
|-10143 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
|-10144 nginx: worker process
`-10145 nginx: worker process
Mar 04 16:49:46 kube-nginx-proxy systemd[1]: Starting A high performance web server and a reverse proxy server...
Mar 04 16:49:46 kube-nginx-proxy systemd[1]: Started A high performance web server and a reverse proxy server.
Next, I configured nginx to be a transparent, reverse TCP proxy. This means that nginx wouldn't be able to actually read any of the TLS traffic, which I didn't want it to, but rather forward along the traffic to a list of servers in a round robin fashion. In order to do this, I had to create a new tcpconf.d
folder inside of /etc/nginx
. Why not put it in site-enabled and sites-available? Well, it turns out that nginx does not actually like having a stream block inside an http block, so since nginx includes sites-available and sites-enabled in a global http block, this would not work. Rather than change the default behavior, I instead edited the global /etc/nginx/nginx.conf
and added a single line right before the global http
block
include /etc/nginx/tcpconf.d/*.conf;
After that line was inserted, I was able to create the tcpconf.d
directory and add talos.conf
, which contained the following:
stream {
upstream kube_api_plane {
server 10.10.100.5:6443;
server 10.10.100.6:6443;
server 10.10.100.7:6443;
}
upstream talosctl_api_plane {
server 10.10.100.5:50000;
server 10.10.100.6:50000;
server 10.10.100.7:50000;
}
server {
listen 443;
proxy_pass kube_api_plane;
}
server {
listen 50000;
proxy_pass talosctl_api_plane;
}
}
This gave me the load balancing I was after, which allowed the cluster initialization master node to go down, and the cluster API would continue to work as before. If I didn't have this nginx + dnsmasq combo, if the master node went down, the workers would only look for a single IP address to send API requests to, and that ip address would be down.
After that, the cluster was ready! I was able to bring down nodes, bring them back up, and everything worked fine! Now, I just had to make sure that I could deploy workloads on top of the cluster, and decide what I wanted to deploy on the cluster. I knew I would be running a Plex server, so I needed a way to store data across the cluster, so hence I needed to install a distributed filesystem. So in the next article, we will go over that and how to consume the storage with our pods.
As always, if you feel this article helped you or if you have any suggestions on the content, leave a comment! I am always looking for ways to improve.
-- Kyle