Hi Chris, 

for me, the main question is what you want to monitor on the HA cluster. I 
always follow a differentiated approach:

There are two types of services: Load balanced services that normally run on 
all cluster nodes, and services that fail over to another node if the current 
node fails. 

In the first case, you need to monitor the service on the individual nodes so 
you notice when one instance of the service goes down so you can take measures 
against single service failure that don't manifest itself in the cluster 
service because of resilience. For the cluster service, you can then either 
define an additional host using the cluster VIP and query remotely, or use the 
Business Service plugin for Icinga Web 2 to derive the state of the combined 
services from the local services. 

In the second case it's a bit more involved as you don't know which cluster 
node the service is supposed to run on. For pacemaker  clusters on RHEL I 
usually resort to the cluster-snmp package, which provides an SNMP sub-agent 
that gives access to the cluster state, and which I can query remotely from a 
satellite zone or the master zone. It's pretty easy to write some SNMP queries 
that give a good overview of the overall state of the cluster and the services 
running on it. 

This approach has one minor drawback: For maximum monitoring resiliency you 
need to run the service check against all cluster hosts, which means that if 
something fails you see multiple identical alerts. That can be solved by using 
keepalived and a VIP on the cluster hosts and have one of the SNMP daemons 
answer all the queries.

  Peter.
_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to