Re: [icinga-users] Icinga 2 Cluster Health Check & Stats

Michael Friedrich Tue, 11 Feb 2014 00:43:50 -0800

On 10.02.2014 21:09, Carlos Cesario wrote:

Good feature!!!


I tested and its working !!!!


Thanks for testing.


The result is
     Icinga 2 Cluster Problem: 1 Endpoints (icinga-node-2) not connected.


One question.
Why the all services in icinga-node-2 remain "ONLINE" ?  Shouldn't
these services switch to offline too?!

I'm not sure what you mean with the term "online" and "offline".Depending on the check authority, those checks being executed on thesecondary node will stay in the same state as before and once thecluster connection is re-established, the check history will besynchronized from b->a again.

If you got any better idea feel free to propose/discuss. One of ourideas, which does not really work, was the following:


a   ----------------------X---------------> b
freshness triggers                       normal check
result is stale, not-ok                  check result, history
....                                     ....
    <-------connection re-established----->
history out-of-sync                      history-out-of-sync

So that way won't work very well unless you don't are about somewhatmixed/merged history and other strange effects.

A different approach could be a special state type (or field) forclustering the service, and based on its authority compared to thecurrent cluster state, it may tell that the current result is stalebecause the node is down (but that would rather be a ui feature then).

Though, that only works if there are authorities used for checkexecution on specific nodes. If there's simple check distribution inplace, removing a cluster node with a checker feature enabled will makethe other nodes re-calculate the check distribution (the "magic hashalgorithm") until that specific nodes comes back online.



Best regards,

Em 10-02-2014 13:17, Michael Friedrich escreveu:

Hi,

Icinga 2 0.0.8 targets cluster & configuration finalization. Therefore
the current snapshot builds contain a simple cluster check which will
turn critical once one or more nodes go away.

It's an internal check method provided as check command by the ITL (a
package upgrade is required to latest snapshot builds).

http://docs.icinga.org/icinga2/snapshot/#cluster-health-check

My two test nodes are icinga2a (config master) and icinga2b (checker).
By killing off the remote node icinga2b, the documentation example
check will switch to critical. The 'authorities' attribute will make
sure that the service check is only executed on node icinga2a.

object Host "icinga2a" inherits "generic-host" {
   services["cluster"] = {
     templates = [ "generic-service" ],
     check_interval = 1m,
     check_command = "cluster",
     authorities = [ "icinga2a" ]
   },
}

You'll also recognize that the 'icinga' self stats check contains more
performance data values (*execution_time, states counters, etc) in
order to satisfy the ordinary icingastats output performance graphers.

http://docs.icinga.org/icinga2/snapshot/#itl-icinga

Have fun playing with Icinga 2 :)


Carlos



--
DI (FH) Michael Friedrich

michael.friedr...@gmail.com  || icinga open source monitoring
https://twitter.com/dnsmichi || lead core developer
dnsmi...@jabber.ccc.de       || https://www.icinga.org/team
irc.freenode.net/icinga      || dnsmichi
_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

Re: [icinga-users] Icinga 2 Cluster Health Check & Stats

Reply via email to