Aug. 26, 2024

Kyle Kaniecki

I sometimes pause during the day to laugh at how much like my father I am becoming. There are many times when I catch myself thinking or acting in a way that I know he would and it makes me reflect on how much wisdom was in a lot of his actions. Pops, if you're reading this - thank you.

I remember being younger and scoffing when he brought up the idea of getting security cameras for their new house. My whole life my parents have lived in remote locations with lots of land, so the idea that someone would travel out into the country to find a single door on a single house seemed foreign to me. Also - once criminals were out there in the country, they likely had already decided to commit a crime. What would video cameras do to prevent it?

However, what I had not considered at the time was that the idea was not to prevent the crime from happening, but rather know about the crime quickly and respond immediately. Crimes are often unpreventable and unavoidable - it often comes down to bad luck. The idea is knowing when you are unlucky.

So, naturally, as I began my journey of self-hosting software I use daily, I began to wonder...

"If someone were to break in, would you know?"

The answer was obvious: no. As far as observability went at the start of this project, my homelab had very little. Sure, I had deployed prometheus for my kubernetes cluster, but the rest of the network was a black box. If one of the services I ran was accidentally exposed to the internet and abused by an attacker, the only indicators I would have to know anything was happening would be cpu, memory, and network metrics. Not great.

So, I set out on a journey to setup my own "security camera network" within my homelab, applying what I had learned in my career, as well as interesting ideas I had heard or seen when speaking with my cyber security friends. The goal was to build a system flexible enough to not disrupt pipelines as data flowed in, but rigid enough to be able to enforce network policies using the metadata quickly. Below are a few graphs to showcase the metadata in a format that is easily consumable.

The graph below shows events from a Cowrie honeypot I have setup on a small Digital Ocean Droplet. Odin pulls from the event stream and discovers IP addresses, tagging them as they are ingested into the system. Any IP addresses tagged here are firewalled automatically at my edge router, using ip lists generated from Odin.

The following is a dashboard of my home network. The data comes from sflow and ipfix data being streamed from my edge router into clickhouse. I import ip addresses from the event stream into odin, perform lookups on them into the background like whois and geoip, and re-enrich the dashboards with the discovered data.

Jumping into a few of these panels individually, we can start to see how piecing together data from multiple sources makes the picture of the network clearer. For example, below are the network map and flow chart for my entire home network. We can see where hosts behind my network are reaching out, and the volume at which they are creating network load compared to other hosts on the network.

To highlight how this may be helpful, I will share one of the things I found out organically from these two graphs. Recently, we started utilizing Tailscale for access to some of our cloud infrastructure. To access this infrastructure, I would regularly start the Tailscale VPN on my work computer and run automation against them as needed. Once I started bringing this tunnel up, I noticed my LAN environments were reaching out to weird addresses across the globe. Specifically, UDP port 3478, which when looked up reveals it’s a STUN mechanism that Tailscale uses on their agents.

Next are some pie graphs to showcase where agents on my LAN are reaching out to, and who owns those IPs. Looking at solely countries is not helpful here, but rather who owns the ip prefixes. Surprise - 90% of the traffic is going to the big 4: Microsoft, Amazon, Google, and Cloudflare

Below is the dashboard that drills down into an individual subnet on my network. It shows information about clients in the subnet, as well as what protocols are being routed commonly. This view is often helpful when trying to determine if specific subnets that dont require internet access are misbehaving or bypassing firewall rules.

Lastly is the client overview dashboard, which shows network information about individual clients within the network. This is helpful when performing network debugging or trying to figure out what the client is doing.

Project Background + Goals

Odin originally started as a place for me to store ip addresses I was seeing on my network and be able to apply metadata to them, like common names, asn names, city, country, etc etc. However, as the project evolved and development continued, I quickly realized that being able to action on the metadata as it is discovered was very valuable.

At first, I was hoping to use other popular open source solutions and not have to go about creating a project I would have to maintain. Early on, I was hoping the popular solution Crowdsec would meet my needs. Basically, Crowdsec does something very similar to Odin - it scans logs for common attack patterns, tags the log stream as malicious, and extracts ip address metadata out of the log streams. It then uploads this metadata into their centralized database, and ip address sets are streamed to "bouncers" to preventing other machines from being attacked.

However, at the time of writing, Crowdsec didnt have a few critical features I wanted. Specifically, I wanted a direct way to influence ip address sets without needing to go through their whole "scenario" process. If an IP address was miscategorized for whatever reason as benign but actively attacking me, I was screwed. Personally, it never happened to me, but it always felt like a missing critical feature, especially when the result is being open to attack.

Another thing Crowdsec was missing that I wanted was the ability to export ip address metadata so I could use it to enrich other data streams. Basically - getting data out of the Crowdsec Local API is really hard to do meaningfully.

I also want to make clear that I think Crowdsec is an amazing solution that fits most usecases where cloud administrators just need to protect some webservers or ssh hosts from internet noise. Crowdsec's plugin architecture and community are top notch. However for my usecase, where I wanted deep insights into the state and history of my network, I needed something more robust.

Project Structure, Tooling, and Definitions

Project Odin is mainly broken up into five parts:

Log streams and data pipelines, getting data into databases to be processed
An API that
1. serves IP address metadata, lists, and tags to clients
2. Evaluates network objects against rulesets created in the database - driven by a ruleset engine similar in structure to the popular linux firewall nftables
A worker pool that
1. processes data from various data sources, ingesting network objects from them
2. Exports ip address metadata to external data sources
3. Perform lookups on base level objects to collect metadata to relate objects together
A caching http proxy, which allows for efficient caching of requests for expensive queries, like ip sets for organizations or countries
A small linux daemon that uses nftables to modify network sets, tables, and chains based on Odin metadata

A few of these bulletpoints have codenames associated with this project.

Heimdall

A kubernetes operator built specifically to manage deployments and cron jobs for data pipelines, and the framework to build those pipelines

Odin

Layer 3 + 4 Metadata API, worker pool, and caching proxy. The API is built using Django, Postgres, Redis, and the caching proxy is built with Rust. This is the layer that stores all of the metadata, and serves ip address lists based on that metadata. For example, a ruleset I use for my reverse proxy in my homelab looks like this:

Ruleset Blackbear Traefik Policy

Table Input

    Chain filter
    Default Action: Block
        Blocks ( src classification "Malicious" )
        Blocks ( src tag "Redtail Cryptobot")
        Blocks ( src tag "ipsum" )
        Allows ( src country "United States" )
        Allows ( src country "Canada" )
        Allows ( src organization "European Union" )

Thor

Firewall daemons that run on endpoint devices. These are built in Rust, and rely on the nft shell for now, but in the future speaking directly to the firewall socket on the host is desired

Blog Series Structure

Throughout the rest of this blog series, I hope to dive into a few interesting engineering topics that came up leading up to the final product (grafana dashboards). The general order that I have in my head at the time of writing would look like:

Infrastructure setup, kubernetes cluster overview, database setup
Data pipelines + database schema, enriching data inline vs joining
Reading data streams from data warehouses, storing metadata about stream information in a durable datastore
Exporting metadata from durable datastores back into event streams for joining discovered context and displaying in dashboards, exporting metadata ip address lists for firewall consumption
Building the firewall daemon to export ip address lists to nftables sets, based on metadata
Cranking performance up to 11 using Rust

Each of those topics will likely have their own respective blog post,so stay tuned if you're interested in my journey. Each blog will likely include new things as the project evolves.

Let me know what ya'll think. Stay safe - Peace

-- Kyle