Kubernetes @ Home Pt 1: Talos & Proxmox

Jan. 25, 2021

Kyle Kaniecki

The blog series are living documents, so I will be updating the specific articles or adding new ones as I continue to fine tune my home lab cluster.

The code for all of these articles in located on my Gitlab.

Intro

Holy vacation, Batman! I've been gone for a while, and haven't been as active on my blog lately, but I promise that it was for a good reason. I've finally started my homelab, and have some really exciting content to share with the void.

As mentioned in my Running bitwarden_rs on Kubernetes blog post, I planned on running my own kubernetes cluster at home. Back in my junior year of college, I happened to land on a project with a peer of mine named Jason (Jason, if you're reading this, thank you!). He happened to have an old server chassis laying around that he was willing to part ways with. The server chassis was a Chenbro RM21706, but had no drives, no motherboard or compute components, and only included the RAID card, chassis fans, and power supply. However, it was the beginning of a server, and I was determined to have it be my server. Quickly, I found an ATX-E motherboard on Facebook marketplace with 2 Quad Core Xeon processors, their cooling blocks, and 32GB of DDR2 RAM (yes, you read that right). However, the motherboard only cost me a whopping $25, so I couldn't pass up the chance to boot up this server and get it running. After buying it from the seller, rushing home and ripping the top panel off of the chassis, I was on my way to starting this thing up.

However, I quickly found out that the old motherboard I had bought was LOUD, like really loud. When I turned the thing on, even in our storage closet, it sounded like a vacuum cleaner was running in the other room. The old Intel CPU cooler ran at super high RPMs, and I didn't want to mess with the BIOS to get the noise level down when I could invest in something a little more modern. This would also allowed me to keep the power costs down, as I was going to be running this in my home 24/7. The Chenbro was fun to play around with, but for running a 24/7 server with home utilities, it was just too loud and power hungry. So instead, I decided to invest around $400 in a new (to me) Dell r620. This little 1U server was not only much more power efficient, but it also was much quieter than the old Chenbro, even being a 1U. The specs of the machine are as follows:

  • CPU: 2x Intel E5-2670
  • RAM: 128 GB DDR3
  • 8x 1TB 2.5in Seagates

I threw the thing in my server box and booted it up. Looking at my Kill-A-Watt, the new R620 would cost me around $12/month to run 24/7. Great, much cheaper than the $50 I was spending on a minimal kubernetes cluster in the cloud.

Setup

First, I had to install Proxmox on the servers. I decided to go with Proxmox because it is really just a modified version of Debian under the hood, and for my home environment, I would rather have something I am familiar with instead of something that is "more secure." This kind of led me to stray away from ESXi and other platforms, which instead have custom kernels for virtualization. After I grabbed the latest Proxmox image, I dd'd the ISO to a space USB drive I had laying around to create a bootable disk.

$ lsblk
NAME          MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda             8:0    0 232.9G  0 disk
\u251c\u2500sda1          8:1    0    16M  0 part
\u2514\u2500sda2          8:2    0 232.9G  0 part
...
$ sudo dd if=/path/to.iso of=/dev/sdX bs=1M status=progress

After the dd command finished, I was able to pull the USB out of my main desktop and plug it into the front of the R620. From here, I was able to configure the BIOS, IPMI server, and raid card. The BIOS already came with pretty sane defaults, so I left that as the default. The fans were quiet enough that I didn't feel the need to modify the fan speeds in the BIOS, but it is available if needed.

Once I got to the raid card configuration though, I was confronted with 2 different paths:

  1. Create a raid disk using the built-in H710P raid controller on the r620. This would give me a raid-5 level disk array across the 7 disks that I had for storage.
    1. Since the H710P doesn't support HBA mode, I could create a bunch of "raid 0" disks, and pass those raid disks to the VMs.

Guess which one I picked initially, and guess which one I needed?

Go with the many raid 0 disks configuration if you're using ZFS storage or Rook/Ceph. They don't play well with a hardware raid controller.

So after configurating the firmware of the machine, I picked the USB drive as my boot device and started up the proxmox installer. The Proxmox Installation Guide is very good at describing all the steps, and the installer itself is very easy to use, so I won't reiterate that here.

Once I had Proxmox booted on my R620, it was time to create a few VMs to start a kubernetes cluster. But I was faced with another decision: I wanted to use proxmox to simulate a larger cluster than I realistically had (10 kubernetes nodes instead of just the R620), but I also wanted to make sure that I utilized as much of the server's resources as possible for pods, not underlying VM operating system bloat. So during my researching on the internet, I found that I had a few options that would allow me to minimize the operating system bloat, while also minimizing the amount of maintainence I would need to do to the underlying operating system. They are:

  1. Talos System's Talos
  2. Rancher Lab's k3OS

For each of the operating systems, I'll go over a quick pros/cons list.

Tl;DR: I chose Talos

Talos

Pros

  1. Literally only the linux kernel and the services needed to start kubernetes -- no console, not extra services, nothing but the Talos gRPC API
    1. Due to this, lower maintanence as well
    2. Also, with less services running, there is less to go wrong on the server to cause a node to go down
  2. Configured using YAML, the same as kubernetes
  3. Active development team
    1. The founder is active on /r/kubernetes, which I browse fairly often
  4. Office hours posted on their website
  5. Completely open source

Cons

  1. No console means harder debugging
  2. High learning curve, since you aren't really configuring a linux server the "normal" way
  3. Talos is very new, so their documentation is kinda poor (If I have time, I'd like to open an issue to find out where I can help in this area)
    1. Also, due to this, online resources are pretty sparse
  4. Unable to really utilize host disk persistent volumes, since so much of the system is ephemeral

K3OS

Pros

  1. A slimmed down ubuntu kernel distribution with alpine binaries, so more familiarity when configuring server "extras"
  2. k3s is very efficient, single binary to start kubernetes, which makes debugging extremely easy, especially with a shell
  3. The disto has a lot more writable directories, so you are able to use more hostPath persistence on the VM itself
  4. Configuration is done using well known linux commands, instead of with a custom
  5. Bigger online community, so more resources if I need any help with the platform
  6. Less "magic"

Cons

  1. k3s is maintained by rancher labs, so it will lag behind the upstream kubernetes versions a little bit. The rancher labs guys are pretty good about this though
  2. Shell and a more "real" distro means more maintence on my end if something goes wrong. More files to reset/blow away to get back to a default state if I fuck something up (which I do, a lot)
  3. More resources are taken for the distro overall

After considering my options, I decided to go with Talos and continue on my homelab journey. Someday, I will probably get a k3OS iso and play around with it, but for now Talos is actually doing me quite well. If a machine goes down, I can simply reboot it and any error state it was in will be wiped away clean.

Talos is super easy to get going with it's basic configuration, but I found that I needed to modify the default configuration quite a bit in order to get the nodes in a state that I wanted. For example, by default, the Talos machines boot into DHCP mode, which is super nice when trying to setup up the machine, but not great afterwards when you want a node to keep the address you give it. Instead, I wanted to give it a static IP address, give the machines a local DNS server on the network which would act as a cache for lower latency, and also give the machines a local time server in order to keep all my Ceph daemons in check (Again, this is breaking the third wall a bit, but if you're curious about my Ceph adventures, take a look at the ceph article in this series). This was all very possible in Talos, but admittedly their documentation was a little lacking. And since Talos is so new, online resources were lacking a lot as well. I plan on opening a few PRs on the Talos repo to improve their docs a bit, but for now I will just share my own configuration and hopefully it will help others. My full configuration is below, with some pieces pulled out:

version: v1alpha1 # Indicates the schema used to decode the contents.
debug: false # Enable verbose logging to the console.
persist: true # Indicates whether to pull the machine config upon every boot.
# Provides machine specific configuration options.
machine:
  type: init # Defines the role of the machine within the cluster.
  token: uyz434.uil3defrvkudkb8y # The `token` is used by a machine to join the PKI of the cluster.
  # The root certificate authority of the PKI.
  ca:
    crt: <cert>
    key: <key>
  # Used to provide additional options to the kubelet.
  kubelet: {}
  # # The `image` field is an optional reference to an alternative kubelet image.
  # image: ghcr.io/talos-systems/kubelet:v1.20.1

  # # The `extraArgs` field is used to provide additional flags to the kubelet.
  # extraArgs:
  #     key: value

  # # The `extraMounts` field is used to add additional mounts to the kubelet container.
  # extraMounts:
  #     - destination: /var/lib/example
  #       type: bind
  #       source: /var/lib/example
  #       options:
  #         - rshared
  #         - rw

  # Provides machine specific network configuration options.
  network:
    # `interfaces` is used to define the network interface configuration.
    interfaces:
      - interface: eth0 # The interface name.
        cidr: 10.10.100.0/24 # Assigns a static IP address to the interface.
        # A list of routes associated with the interface.
        routes:
          - network: 0.0.0.0/0 # The route's network.
            gateway: 10.10.100.1 # The route's gateway.
            metric: 1024 # The optional metric for the route.
        mtu: 1500 # The interface's MTU.

        # # Bond specific options.
        # bond:
        #     # The interfaces that make up the bond.
        #     interfaces:
        #         - eth0
        #         - eth1
        #     mode: 802.3ad # A bond option.
        #     lacpRate: fast # A bond option.

        # # Indicates if DHCP should be used to configure the interface.
        # dhcp: true

        # # DHCP specific options.
        # dhcpOptions:
        #     routeMetric: 1024 # The priority of all routes received via DHCP.

    # Used to statically set the nameservers for the machine.
    nameservers:
      - 10.10.100.2
      - 10.10.100.1

  # # Allows for extra entries to be added to the `/etc/hosts` file
  # extraHostEntries:
  #     - ip: 192.168.1.100 # The IP of the host.
  #       # The host alias.
  #       aliases:
  #         - example
  #         - example.domain.tld

  # Used to provide instructions for installations.
  install:
    disk: /dev/sda # The disk used for installations.
    image: ghcr.io/talos-systems/installer:v0.8.4 # Allows for supplying the image used to perform the installation.
    bootloader: true # Indicates if a bootloader should be installed.
    wipe: false # Indicates if the installation disk should be wiped at installation time.

    # # Allows for supplying extra kernel args via the bootloader.
    # extraKernelArgs:
    #     - talos.platform=metal
    #     - reboot=k

  # # Extra certificate subject alternative names for the machine's certificate.

  # # Uncomment this to enable SANs.
  # certSANs:
  #     - 10.0.0.10
  #     - 172.16.0.10
  #     - 192.168.0.10

  # # Used to partition, format and mount additional disks.

  # # MachineDisks list example.
  # disks:
  #     - device: /dev/sdb # The name of the disk to use.
  #       # A list of partitions to create on the disk.
  #       partitions:
  #         - mountpoint: /var/mnt/extra # Where to mount the partition.
  #
  #           # # This size of partition: either bytes or human readable representation.

  #           # # Human readable representation.
  #           # size: 100 MB
  #           # # Precise value in bytes.
  #           # size: 1073741824

  # # Allows the addition of user specified files.

  # # MachineFiles usage example.
  # files:
  #     - content: '...' # The contents of the file.
  #       permissions: 0o666 # The file's permissions in octal.
  #       path: /tmp/file.txt # The path of the file.
  #       op: append # The operation to use

  # # The `env` field allows for the addition of environment variables.

  # # Environment variables definition examples.
  # env:
  #     GRPC_GO_LOG_SEVERITY_LEVEL: info
  #     GRPC_GO_LOG_VERBOSITY_LEVEL: "99"
  #     https_proxy: http://SERVER:PORT/
  # env:
  #     GRPC_GO_LOG_SEVERITY_LEVEL: error
  #     https_proxy: https://USERNAME:PASSWORD@SERVER:PORT/
  # env:
  #     https_proxy: http://DOMAIN\USERNAME:PASSWORD@SERVER:PORT/

  # # Used to configure the machine's time settings.

  # # Example configuration for cloudflare ntp server.
  time:
    disabled: false # Indicates if the time service is disabled for the machine.
    # Specifies time (NTP) servers to use for setting the system time.
    servers:
      - time.cloudflare.com

  # # Used to configure the machine's sysctls.

  # # MachineSysctls usage example.
  # sysctls:
  #     kernel.domainname: talos.dev
  #     net.ipv4.ip_forward: "0"

  # # Used to configure the machine's container image registry mirrors.
  # registries:
  #     # Specifies mirror configuration for each registry.
  #     mirrors:
  #         ghcr.io:
  #             # List of endpoints (URLs) for registry mirrors to use.
  #             endpoints:
  #                 - https://registry.insecure
  #                 - https://ghcr.io/v2/
  #     # Specifies TLS & auth configuration for HTTPS image registries.
  #     config:
  #         registry.insecure:
  #             # The TLS configuration for the registry.
  #             tls:
  #                 insecureSkipVerify: true # Skip TLS server certificate verification (not recommended).
  #
  #                 # # Enable mutual TLS authentication with the registry.
  #                 # clientIdentity:
  #                 #     crt: TFMwdExTMUNSVWRKVGlCRFJWSlVTVVpKUTBGVVJTMHRMUzB0Q2sxSlNVSklla05DTUhGLi4u
  #                 #     key: TFMwdExTMUNSVWRKVGlCRlJESTFOVEU1SUZCU1NWWkJWRVVnUzBWWkxTMHRMUzBLVFVNLi4u
  #
  #             # # The auth configuration for this registry.
  #             # auth:
  #             #     username: username # Optional registry authentication.
  #             #     password: password # Optional registry authentication.
# Provides cluster specific configuration options.
cluster:
  # Provides control plane specific configuration options.
  controlPlane:
    endpoint: https://blackbear-cluster.local:6443 # Endpoint is the canonical controlplane endpoint, which can be an IP address or a DNS hostname.
  clusterName: blackbear-cluster # Configures the cluster's name.
  # Provides cluster specific network configuration options.
  network:
    dnsDomain: cluster.local # The domain used by Kubernetes DNS.
    # The pod subnet CIDR.
    podSubnets:
      - 10.244.0.0/16
    # The service subnet CIDR.
    serviceSubnets:
      - 10.96.0.0/12

    # # The CNI used.
    # cni:
    #     name: custom # Name of CNI to use.
    #     # URLs containing manifests to apply for the CNI.
    #     urls:
    #         - https://raw.githubusercontent.com/cilium/cilium/v1.8/install/kubernetes/quick-install.yaml
  token: <token> # The [bootstrap token](https://kubernetes.io/docs/reference/access-authn-authz/bootstrap-tokens/) used to join the cluster.
  aescbcEncryptionSecret: <key> # The key used for the [encryption of secret data at rest](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/).
  # The base64 encoded root certificate authority used by Kubernetes.
  ca:
    crt: <certificate>
    key: <key>
  # API server specific configuration options.
  apiServer:
    # Extra certificate subject alternative names for the API server's certificate.
    certSANs:
      - blackbear-cluster.pihole

    # # The container image used in the API server manifest.
    # image: k8s.gcr.io/kube-apiserver-amd64:v1.20.1
  # Controller manager server specific configuration options.
  controllerManager: {}
  # # The container image used in the controller manager manifest.
  # image: k8s.gcr.io/kube-controller-manager-amd64:v1.20.1

  # Kube-proxy server-specific configuration options
  proxy: {}
  # # The container image used in the kube-proxy manifest.
  # image: k8s.gcr.io/kube-proxy-amd64:v1.20.1

  # Scheduler server specific configuration options.
  scheduler: {}
  # # The container image used in the scheduler manifest.
  # image: k8s.gcr.io/kube-scheduler-amd64:v1.20.1

  # Etcd specific configuration options.
  etcd:
    # The `ca` is the root certificate authority of the PKI.
    ca:
      crt: <openssl_cert>
      key: <base64_gen_key>

    # # The container image used to create the etcd service.
    # image: gcr.io/etcd-development/etcd:v3.4.14

  # # Pod Checkpointer specific configuration options.
  # podCheckpointer:
  #     image: '...' # The `image` field is an override to the default pod-checkpointer image.

  # # Core DNS specific configuration options.
  # coreDNS:
  #     image: k8s.gcr.io/coredns:1.7.0 # The `image` field is an override to the default coredns image.

  # # A list of urls that point to additional manifests.
  # extraManifests:
  #     - https://www.example.com/manifest1.yaml
  #     - https://www.example.com/manifest2.yaml

  # # A map of key value pairs that will be added while fetching the ExtraManifests.
  # extraManifestHeaders:
  #     Token: "1234567"
  #     X-ExtraInfo: info

  # # Settings for admin kubeconfig generation.
  # adminKubeconfig:
  #     certLifetime: 1h0m0s # Admin kubeconfig certificate lifetime (default is 1 year).

There are a few really important bits in the configuration here that are critical for a more production ready kubernetes cluster:

  • It sets a static IP address in the network section of the machine configuration. This makes sure that nodes don't move around if we happen to use the NodePort service type
  • It sets the time servers of the machine to use known time servers (Cloudflare in this instance, but it would preferably be local if using time critical pods)
  • It sets the clusterendpoint to a DNS resolvable name instead of a hard coded IP address, which allows for a load balanced, HA API layer

After the configuration has been modified, I used the talosctl cli tool to push the configuration to the node and start the bootstrap process. Here, I will give the Talos devs a shoutout and say that the CLI tool really makes configuring these machines easy. Kudos!

Once the cluster init machine was booted and in the ready state, I added a few more master nodes using the controlplane.yml file that was also generated, as well as 7 worker nodes. My final cluster for my homelab looked like this: Kubernetes README.

I made the same changes to the controlplane.yml and join.yml files that were generate with the talosctl command. This ensured that all nodes, both master and worker, would have consistent settings across the cluster.

Configuring The Cluster Endpoint

So now that we've told our machines to use our DNS name as a cluster endpoint, I needed to figure out how I would resolve that hostname, and if it would do any kind of load balancing. I was using a dnsmasq lxc container to resolve dns queries for the cluster, so I was able to set the cluster endpoint to one ip address rather easily, however I wanted to have the API request load split evenly across all master nodes that were in the cluster. Dnsmasq could do this for me, but I ultimately decided to use nginx, as I was more familiar with it.

In a new lxc container, I installed nginx on a debian-slim instance. To do this, I ran the following commands:

root@kube-nginx-proxy:~# apt-get install nginx
...
root@kube-nginx-proxy:~# ls /etc/nginx/
conf.d  fastcgi.conf  fastcgi_params  koi-utf  koi-win  mime.types  modules-available  modules-enabled  nginx.conf  proxy_params  scgi_params  sites-available  sites-enabled  snippets  uwsgi_params  win-utf
root@kube-nginx-proxy:~# systemctl status nginx.service 
* nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-03-04 16:49:46 UTC; 1 weeks 6 days ago
     Docs: man:nginx(8)
  Process: 10140 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
  Process: 10142 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
 Main PID: 10143 (nginx)
    Tasks: 3 (limit: 4915)
   Memory: 5.8M
   CGroup: /system.slice/nginx.service
           |-10143 nginx: master process /usr/sbin/nginx -g daemon on; master_process on;
           |-10144 nginx: worker process
           `-10145 nginx: worker process

Mar 04 16:49:46 kube-nginx-proxy systemd[1]: Starting A high performance web server and a reverse proxy server...
Mar 04 16:49:46 kube-nginx-proxy systemd[1]: Started A high performance web server and a reverse proxy server.

Next, I configured nginx to be a transparent, reverse TCP proxy. This means that nginx wouldn't be able to actually read any of the TLS traffic, which I didn't want it to, but rather forward along the traffic to a list of servers in a round robin fashion. In order to do this, I had to create a new tcpconf.d folder inside of /etc/nginx . Why not put it in site-enabled and sites-available? Well, it turns out that nginx does not actually like having a stream block inside an http block, so since nginx includes sites-available and sites-enabled in a global http block, this would not work. Rather than change the default behavior, I instead edited the global /etc/nginx/nginx.conf and added a single line right before the global http block

include /etc/nginx/tcpconf.d/*.conf;

After that line was inserted, I was able to create the tcpconf.d directory and add talos.conf, which contained the following:

stream {
  upstream kube_api_plane {
    server 10.10.100.5:6443;
    server 10.10.100.6:6443;
    server 10.10.100.7:6443;
  }

  upstream talosctl_api_plane {
    server 10.10.100.5:50000;
    server 10.10.100.6:50000;
    server 10.10.100.7:50000;
  }

  server {
    listen 443;
    proxy_pass kube_api_plane;
  }

  server {
    listen 50000;
    proxy_pass talosctl_api_plane;
  }
}

This gave me the load balancing I was after, which allowed the cluster initialization master node to go down, and the cluster API would continue to work as before. If I didn't have this nginx + dnsmasq combo, if the master node went down, the workers would only look for a single IP address to send API requests to, and that ip address would be down.

After that, the cluster was ready! I was able to bring down nodes, bring them back up, and everything worked fine! Now, I just had to make sure that I could deploy workloads on top of the cluster, and decide what I wanted to deploy on the cluster. I knew I would be running a Plex server, so I needed a way to store data across the cluster, so hence I needed to install a distributed filesystem. So in the next article, we will go over that and how to consume the storage with our pods.

As always, if you feel this article helped you or if you have any suggestions on the content, leave a comment! I am always looking for ways to improve.

-- Kyle


comments

Please enter a valid display name
Please enter a valid email
Your email will not be displayed publicly
Please enter a comment