Homelab 2025 Update

Feb. 22, 2025

Kyle Kaniecki

Oh my, it has been a while since I have posted a blog. One of my new years' resolutions for 2025 is to become healthier, and I am sure many of you reading this can relate to that. Part of that mission for me is to express a lot of the goals and ideas I have with the software community at large, as well as write some of my own projects, scripts and ideas down so that I can refer back to them later. Hopefully this will inspire some of you to do the same.

With that, I figured the first part of this journey would be to write an article about the evolution of the hardware I run at home, how it's setup, and where I see it going in the future. This article is mostly going to be a high level overview of my hardware and the tools I run on it so that I can provide some context to the future articles I plan on writing. Stay tuned for those.

Hardware

My lab is constantly evolving, so likely it will be different by the time you read this. I often prefer writing software than blog posts! Over 4 years ago now, I started my homelab journey by buying a small Dell R210II from ebay and putting Opnsense on it. I started small, learning about virtualization platforms and first-class hypervisor environments, and eventually built the lab I run today. As of 09/20/24, that consists of the following machines:

Custom Sliger 4170a

ASRock RACK ROMED8-2T/BCM Single socket AM4

AMD Epyc 7502 32 Core processor

256 GB ECC DDR4 (8x32 DIMMS)

ASUS Hyper M.2 X16 PCIe 4.0 X4 Expansion Card

NVIDIA P2000 Graphics card

Intel Dual Port X520-DA2 10g SFP+

8 Bay Icydock ExpressCage MB038SP-B

This is what I use to power my ssd storage pool, along with the zfs mirror for the Proxmox boot disks. Proxmox 8 is installed on a zfs mirror between two 256gb consumer SSDs, and all VM disks are run off another 2TB zfs disk mirror that are connected directly to the motherboard over the NVME expansion card

Dell R730XD

Dual Intel Xeon CPU E5-2660 v3 @ 2.60GHz

128 GB of ECC DDR4 ( 8 x 16GB DIMMS)

12 x 4TB 7200rpm Hard drives from various vendors.

These drives will likely be replaced soon with a smaller, higher density pool. Proxmox 8 is installed on a zfs mirror between two 1TB Crucial SSDs installed in the rear HDD spaces, and all VM disks are run off this mirror. All 12 disks are passed through to VMs running on the hypervisor and used for my Ceph pool

Beelink Mini PC

6 core AMD Ryzen AM5 CPU

12 GB of unregistered Memory

1TB NVME SSD

Proxmox 8 is installed in a simple ext4 setup here. This server's main use case is to run failover services in case I need to perform maintenance on the main server, or if I need to run the internet off of my battery backup for extended periods of time. Between this 20W mini-pc and my 20W brocade, I could run my internet for a couple hours no problem.

Cyberpower Battery Backup

This battery backup is a 2U rack mounted system that powers all systems in my rack. It's about as dumb as it can get, and has worked rock solid since I bought it in 2021 after a few power outages corrupted my hypervisors' boot partitions. Definitely well worth the cost.

Brocade 7520 24p PoE Switch

This was a switch I bought off of Ebay way back in 2020, and it is still going strong to this day. However, how I connect my servers to this switch has changed a lot, with the two main servers running bonded fiber to the 4 10g ports to support network storage.

At the end of the day, I do plan on swapping this out for a more supported switch from Mikrotik soon, but it's working Just Fine for me now, so it's lower on my priority list.

Operating Systems / Infrastructure

I run quite a few VMs in my homelab to experiment with / research cluster computing, computing at scale, and distributed computing, but I try to keep each VM as uniform as possible. In general, I run everything I can in Kubernetes, which runs on top of Debian 12 VMs. Each kubernetes node is configured, automated and controlled with Ansible playbooks.

Each server utilizes 2 SR-IOV Virtual functions running in an xor bond configuration,

CRI-O Container runtime, utilizing crun as the container interface. This allows me to do cool things with docker-in-docker for gitlab runners

Infrastructure VMs and Supporting Services

To run my kubernetes cluster, where the majority of my resources and applications go, there are a few "service" vms that are needed to bootstrap the cluster. I call these VMs the "core utils" and each run in a keepalived vrrp pair with one of each pair running on a different physical hypervisor.

Kubernetes Cluster Overview + Layout

My current kubernetes cluster node layout looks a little bit like this:

  • 3 Control Plane nodes, split across the three physical hypervisors. This guarantees that kubernetes will continue to run, even if something catastrophic happens on a single hypervisor or if I bring one down for maintenance (like installing a new network card)
  • 3 Storage nodes running exclusively on the Dell R730, each with 4 of the 12 disks passed through to the VM.
  • 1 Database node with local nvme storage that exclusively runs database workloads like postgres and redis
  • 2 Observability nodes, which run Clickhouse, Elasticsearch, Loki, Grafana, and Prometheus. These nodes are given a majority of the memory in the cluster
  • 2 ingress nodes that runs web services that are port forwarded to the internet and hosts the reverse proxy to terminate SSL
  • 4 General Worker nodes distributed across the 3 servers that run things like internal web services, supporting workloads like operators and cluster coordination services.

Overall, this setup isn't exactly "production ready", as there are multiple single points of failure for things like databases and storage, but for my homelab needs it works just fine. I've been running it for over 3 years at this point, and I think the only downtime has been from me botching upgrades...

The kubernetes cluster runs Cilium as it's CNI, which uses Cilium's BGP control plane to establish BGP sessions to my edge routers. This allows me to advertise service IP addresses to computers that are not within the cluster and setup direct routing in order to run the cluster without a vxlan. While the vxlan overhead when running over a 9000 MTU network is negligable, it was a cool experiment and allows my lan devices to reach the services inside the kubernetes cluster natively instead of needing to go through the encapsulation layer.

Applications / Workloads

This is where the fun part begins. I could write a whole blog series on the applications I run in my cluster, so I will keep it at an overview for now. On the kubernetes cluster itself, I am running various systems and services, mostly to get data where it needs to go or serve data to those who need to get it

  1. Vector instances, which run various TCP / HTTP servers to receive events from remote servers or other cluster workloads. This is normally what gets events + logs _into_ a database that I am running on my database / observability nodes
  2. A kafka cluster, deployed using the Strimzi Kafka Operator. I also run CruiseControl, previous maintained by LinkedIn, to baby the kafka cluster. Honestly, this combo has been absolutely rock solid, and I haven't needed to perform any maintenance other than version upgrades. I would highly recommend it. I use this to buffer events to disk in case my databases have issues. Overall, kafka tries it's best to guarantee delivery of events to the dbs.
  3. A few Clickhouse database cluster, with dual replicas running behind CHProxy. This is where I store all of the sflow, Zeek, logs, stock market data, and telemetry coming from my services. These clusters use a mix of zookeeper and clickhouse keeper to maintain state. If I were to do it over again, I would just run one cluster with clickhouse keeper and put all the storage on a couple SSDs passed through to the VMs. I'll probably migrate to this once I have time.
  4. An Elasticsearch database that a few of my web servers connect to for advanced search functionality
  5. A Loki HA cluster, using Ceph Object Storage as it's primary storage solution. This is where I send all of my kubernetes pod logs, and how I watch what is happening on my Traefik ingress from the internet.

comments

Please enter a valid display name
Please enter a valid email
Your email will not be displayed publicly
Please enter a comment