Out of Band Monitoring of Kubernetes Cluster using Containerized Zabbix

Why Out of Band Monitoring (OOBM)?

The advantage OOBM is its reliability when the internal network is down, a node is down, or rebooting, crashing, or otherwise inaccessible. OOBM can be used to remotely monitor device’s capacity and performance metrics, other items etc.. The core idea is to preserve information of monitoring data outside of the Kubernetes cluster.
Software management tools such as Zabbix can be used for capacity and performance monitoring, and some remote troubleshooting, alerting via email, slack and the plethora of ChatOps tools only work when the network is up!

Disruption and downtime are minimized by providing better visibility of both physical environment and status of equipment. This ensures business continuity through improved uptime, efficiencies and faster recoveries from the outages.

What is Zabbix?

Zabbix was created by Alexei Vladishev, and currently is actively developed and supported by Zabbix SIA. Zabbix is an enterprise-class open source distributed monitoring solution.

Zabbix is software that monitors numerous parameters of a network and the health and integrity of servers, virtual machines, applications, services, databases, websites, the cloud and more. Zabbix uses a flexible notification mechanism that allows users to configure e-mail based alerts for virtually any event. This allows a fast reaction to server problems. Zabbix offers excellent reporting and data visualization features based on the stored data. This makes Zabbix ideal for capacity planning.

Zabbix supports both polling and trapping. All Zabbix reports and statistics, as well as configuration parameters, are accessed through a web-based frontend. A web-based frontend ensures that the status of your network and the health of your servers can be assessed from any location. Properly configured, Zabbix can play an important role in monitoring IT infrastructure. This is equally true for small organizations with a few servers and for large companies with a multitude of servers.

Zabbix Features

Zabbix is a highly integrated network monitoring solution, offering a multiplicity of features in a single package.

Data gathering

  • availability and performance checks
  • support for SNMP (both trapping and polling), IPMI, JMX, VMware monitoring
  • custom checks
  • gathering desired data at custom intervals
  • performed by server/proxy and by agents

Flexible threshold definitions

  • you can define very flexible problem thresholds, called triggers, referencing values from the backend database

Highly configurable alerting

  • sending notifications can be customized for the escalation schedule, recipient, media type
  • notifications can be made meaningful and helpful using macro variables
  • automatic actions include remote commands

Real-time graphing

  • monitored items are immediately graphed using the built-in graphing functionality

Web monitoring capabilities

  • Zabbix can follow a path of simulated mouse clicks on a web site and check for functionality and response time

Extensive visualization options

  • ability to create custom graphs that can combine multiple items into a single view
  • network maps
  • slideshows in a dashboard-style overview
  • reports
  • high-level (business) view of monitored resources

Historical data storage

  • data stored in a database
  • configurable history
  • built-in housekeeping procedure

Easy configuration

  • add monitored devices as hosts
  • hosts are picked up for monitoring, once in the database
  • apply templates to monitored devices

Use of templates

  • grouping checks in templates
  • templates can inherit other templates

Network discovery

  • automatic discovery of network devices
  • agent autoregistration
  • discovery of file systems, network interfaces and SNMP OIDs

Fast web interface

  • a web-based frontend in PHP
  • accessible from anywhere
  • you can click your way through
  • audit log

Zabbix API

  • Zabbix API provides programmable interface to Zabbix for mass manipulations, 3rd party software integration and other purposes.

Permissions system

  • secure user authentication
  • certain users can be limited to certain views

Full featured and easily extensible agent

  • deployed on monitoring targets
  • can be deployed on both Linux and Windows

Binary daemons

  • written in C, for performance and small memory footprint
  • easily portable

Ready for complex environments

  • remote monitoring made easy by using a Zabbix proxy

Zabbix overview

Architecture

Zabbix consists of several major software components, the responsibilities of which are outlined below.

Server

Zabbix server is the central component to which agents report availability and integrity information and statistics. The server is the central repository in which all configuration, statistical and operational data are stored.

Database storage

All configuration information as well as the data gathered by Zabbix is stored in a database.

Web interface

For an easy access to Zabbix from anywhere and from any platform, the web-based interface is provided. The interface is part of Zabbix server, and usually (but not necessarily) runs on the same physical machine as the one running the server.

Proxy

Zabbix proxy can collect performance and availability data on behalf of Zabbix server. A proxy is an optional part of Zabbix deployment; however, it may be very beneficial to distribute the load of a single Zabbix server.

Agent

Zabbix agents are deployed on monitoring targets to actively monitor local resources and applications and report the gathered data to Zabbix server. Since Zabbix 4.4, there are two types of agents available: the Zabbix agent (lightweight, supported on many platforms, written in C) and the Zabbix agent 2 (extra-flexible, easily extendable with plugins, written in Go).

Data flow

In addition it is important to take a step back and have a look at the overall data flow within Zabbix. In order to create an item that gathers data you must first create a host. Moving to the other end of the Zabbix spectrum you must first have an item to create a trigger. You must have a trigger to create an action. Thus if you want to receive an alert that your CPU load is too high on Server X you must first create a host entry for Server X followed by an item for monitoring its CPU, then a trigger which activates if the CPU is too high, followed by an action which sends you an email. While that may seem like a lot of steps, with the use of templating it really isn’t. However, due to this design it is possible to create a very flexible setup.

Hardware and Software (What I’m using to run the OOBM)

  1. Raspberry Pi 4 :: 8GB
  2. SAMSUNG EVO Plus SD card :: 64GB
  3. Raspberry Pi POE HAT
  4. D-Link Gigabit 10 Ports POE Switch
  5. Raspberry Pi 4 8 GB X 6 Nodes Rancher K3S Kubernetes Cluster
  6. QNAP TS431K NAS Box with WD RED 8TB X 2 3.5″ HDD
  7. Ubuntu Server 21.04
  8. Rancher K3S Multi Master Cluster
  9. Docker
  10. Zabbix 5

Bash Script for Zabbix Installation from containers

#!/bin/bash

### Bash script to create dedicated network for zabbix, deploy DB, SNMP Traps, App Server and Web Frontend containers ###

# Create network dedicated for Zabbix component containers

docker network create --subnet 172.20.0.0/16 --ip-range 172.20.240.0/20 zabbix-net

# Deploy a PostgreSQL container for zabbix to use.

# Data persistence is enabled by mounting the /var/lib/postgresql/data volume

docker run --name postgres-server -t \
      -e POSTGRES_USER="zabbix" \
      -e POSTGRES_PASSWORD="zabbix_pwd" \
      -e POSTGRES_DB="zabbix" \
      --network=zabbix-net \
      --restart unless-stopped \
      -v /home/ubuntu/zabbix/postgres-data:/var/lib/postgresql/data \
      -d postgres:latest

# Deploy the Zabbix snmptraps container

docker run --name zabbix-snmptraps -t \
      --network=zabbix-net \
      -v /home/ubuntu/zabbix/snmptraps/rw:/var/lib/zabbix/snmptraps:rw \
      -v /home/ubuntu/zabbix/mibs/ro:/usr/share/snmp/mibs:ro \
      -p 162:1162/udp \
      --restart unless-stopped \
      -d zabbix/zabbix-snmptraps:alpine-5.4-latest 

# Deploy the Zabbix Server application container which will use PostgreSQL

# There are many volumes mounted to maintain the data persistence 

docker run --name zabbix-server-pgsql -t \
      -e DB_SERVER_HOST="postgres-server" \
      -e POSTGRES_USER="zabbix" \
      -e POSTGRES_PASSWORD="zabbix_pwd" \
      -e POSTGRES_DB="zabbix" \
      -e ZBX_ENABLE_SNMP_TRAPS="true" \
      --network=zabbix-net \
      -v /home/ubuntu/zabbix/alertscripts:/usr/lib/zabbix/alertscripts \
      -v /home/ubuntu/zabbix/externalscripts:/usr/lib/zabbix/externalscripts \
      -v /home/ubuntu/zabbix/modules:/var/lib/zabbix/modules \
      -v /home/ubuntu/zabbix/enc:/var/lib/zabbix/enc \
      -v /home/ubuntu/zabbix/ssh_keys:/var/lib/zabbix/ssh_keys \
      -v /home/ubuntu/zabbix/ssl/certs:/var/lib/zabbix/ssl/certs \
      -v /home/ubuntu/zabbix/ssl/keys:/var/lib/zabbix/ssl/keys \
      -v /home/ubuntu/zabbix/ssl_ca:/var/lib/zabbix/ssl/ssl_ca \
      -v /home/ubuntu/zabbix/snmptraps:/var/lib/zabbix/snmptraps \
      -v /home/ubuntu/zabbix/mibs:/var/lib/zabbix/mibs \
      -p 10051:10051 \
      --volumes-from zabbix-snmptraps \
      --restart unless-stopped \
      -d zabbix/zabbix-server-pgsql:alpine-5.4-latest

# Zabbix web interface and link the container with created PostgreSQL server and Zabbix server instances

docker run --name zabbix-web-nginx-pgsql -t \
      -e ZBX_SERVER_HOST="zabbix-server-pgsql" \
      -e DB_SERVER_HOST="postgres-server" \
      -e POSTGRES_USER="zabbix" \
      -e POSTGRES_PASSWORD="zabbix_pwd" \
      -e POSTGRES_DB="zabbix" \
      --network=zabbix-net \
      -p 8443:8443 \
      -p 8080:8080 \
      -v /etc/ssl/nginx:/etc/ssl/nginx:ro \
      --restart unless-stopped \
      -d zabbix/zabbix-web-nginx-pgsql:alpine-5.4-latest

Note: Please update the username, passwords to suit to your setup.

Run the command to install Zabbix Containers

# ./install_zabbix.sh

After the successful run and installation of Zabbix containers you will be able to see the running containers as under

First 4 containers are part of Zabbix. Please ignore rest of the containers.

Note: Zabbix snmptrap container exposes the 162/UDP port (SNMP traps) to host machine. Zabbix web interface container exposes the 8443/TCP (HTTPS) and 8080 (HTTP) ports to host machine. Change them accordingly to suit your setup.

K3S Cluster Nodes Setup using Ansible Playbook

---

- name: Playbook to Zabbix clients setup 
  hosts: k3s
  become: true
  become_user: root
  gather_facts: true
  tasks:

    - name: Install zabbix-agent on all nodes 
      package:
        name: zabbix-agent
        state: present

    - name: Insert/Update configuration block /etc/zabbix/zabbix_agentd.conf
      blockinfile:
        path: /etc/zabbix/zabbix_agentd.conf
        block: |
          Server=192.168.15.3
          ServerActive=192.168.15.3
          Hostname={{ansible_hostname}}
          
    - name: Remove lines in the file /etc/zabbix/zabbix_agentd.conf
      lineinfile:
        path: /etc/zabbix/zabbix_agentd.conf
        regexp: {{ item }}
        state: absent
          #line: #Server 127.0.0.1
      loop:
        - '^Server=127.0.0.1'
        - '^ServerActive=127.0.0.1'
      notify:
        - Restart Zabbix
    
  handlers:
    - name: Restart Zabbix
      service:
        name: zabbix-agent
        state: restarted
        enabled: true
...
   

Run the ansible playbook to install Zabbix-Agent, Configure the Zabbix Agent config file to use IP address of Zabbix Server Container host machine, remove default configuration settings and restart and enable the zabbix agent server during the boot.

#ansible-playbook -i hosts.ini -u ubuntu zabbix_agent_install.yaml

Login Page
Default Monitoring Dashboard after discovery and hosts added
Discovery rules set to local network – 192.168.15.1-254
Discovery action set to add discovered hosts to “Linux Servers” Host Group
List of discovered hosts added to host groups. QNAP NAS Server added via QNAP SNMP monitoring template
Out of the box dashboards :: 1. Network Traffic of virtual interface CNI0 – K3S Master Node
CPU usage of K3S Master node
Disk Utilization
SNMP configuration of QNAP NAS
Triggers example from QNAP NAS Server
Email Alert Notification configuration
Zabbix Alert emails integration examples

After comparing open source monitoring tools like Munin, Nagios, Icinga2, Monit etc… Zabbix stands out the best in terms of ease of installation (Containerized/Cloud-Native ready), configuration, discovery and out of the box features to name a few.

Bonus :: APIs

Zabbix API allows to automate fetching of historical monitoring metrics data from Zabbix Server. This enables to build customized alerts, integration with pager duty or ops genie, Self Healing or Self Diagnostics playbooks to take actions against the alerts proactively. For more info please visit the API documentation.

Source and References

https://www.zabbix.com/documentation/current/manual/installation/containers

https://www.zabbix.com/documentation/current/start

https://blog.programster.org/deploy-zabbix-through-docker

https://techexpert.tips/zabbix/zabbix-email-notification-setup/

If you enjoyed this post, I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on your social platforms. Thank you!

What am I missing here? Let me know in the comments and I’ll add it in!

One thought on “Out of Band Monitoring of Kubernetes Cluster using Containerized Zabbix

  1. Good one Vinay. Will surely spread this article in my network. Looks a good monitoring solution for K8’s or Cloud

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s