Monitor and get notified
One thing is running services; another is monitoring them. Monitoring can be done at many different levels. At the lowest level, in my own environment, I monitor hard drives. For this, I use the S.M.A.R.T. analysis, monitoring, and reporting capabilities built into hard drives. I collect the S.M.A.R.T. diagnostic data produced by the hard drives into a centralized service called Scrutiny. Hard drives are consumable items—they will eventually fail. The goal is to detect early warning signs of a potential failure so that data can still be transferred to a new drive before a catastrophic failure occurs.

Of course, I also monitor my server’s CPU usage, memory usage, and temperatures, but there are no separate notifications specifically configured for these. This information is displayed on both dashboards (Homepage and Glance), which are described in more detail on my blog: Personal Dashboard. I also have a separate real-time updating dashboard for CPU, hard drive, memory, and network usage called Dashdot.

In addition to monitoring the actual hardware, I also monitor Linux services, such as the Docker runtime. Monitoring Linux services can be done in several ways, but I chose to use the Monit service. This service monitors a Linux service and, if it becomes unreachable, attempts to restart it automatically.
Something weird happening with swap!?!
The next level is monitoring the containers running in the Docker runtime. Docker itself tracks the status of containers, which can be starting, healthy, or unhealthy. There are also applications that can attempt to automatically fix failing containers, but I haven’t set one up myself, as I don’t see it adding any value for me.

On top of all this, I also run a separate monitoring application, Uptime, which monitors the services I maintain from the end user’s perspective. Uptime ensures that the services are reachable and that their certificates are up to date. If a service is unreachable, it retries a few times and then sends a notification via Signal to my phone.

What’s next? There is one major weakness in my current monitoring setup: all of this runs on a single device. If the server goes down, not a single notification is sent out. I still have an extra Raspberry Pi 3 in the closet, which I could use to build a dedicated monitoring device, or alternatively, I could install Uptime on my existing Home Assistant setup.