On 10.02.2014 21:09, Carlos Cesario wrote:
Good feature!!!

I tested and its working !!!!

Thanks for testing.


The result is
     Icinga 2 Cluster Problem: 1 Endpoints (icinga-node-2) not connected.


One question.
Why the all services in icinga-node-2 remain "ONLINE" ?  Shouldn't
these services switch to offline too?!

I'm not sure what you mean with the term "online" and "offline". Depending on the check authority, those checks being executed on the secondary node will stay in the same state as before and once the cluster connection is re-established, the check history will be synchronized from b->a again.

If you got any better idea feel free to propose/discuss. One of our ideas, which does not really work, was the following:

a   ----------------------X---------------> b
freshness triggers                       normal check
result is stale, not-ok                  check result, history
....                                     ....
    <-------connection re-established----->
history out-of-sync                      history-out-of-sync


So that way won't work very well unless you don't are about somewhat mixed/merged history and other strange effects.

A different approach could be a special state type (or field) for clustering the service, and based on its authority compared to the current cluster state, it may tell that the current result is stale because the node is down (but that would rather be a ui feature then).

Though, that only works if there are authorities used for check execution on specific nodes. If there's simple check distribution in place, removing a cluster node with a checker feature enabled will make the other nodes re-calculate the check distribution (the "magic hash algorithm") until that specific nodes comes back online.




Best regards,

Em 10-02-2014 13:17, Michael Friedrich escreveu:
Hi,

Icinga 2 0.0.8 targets cluster & configuration finalization. Therefore
the current snapshot builds contain a simple cluster check which will
turn critical once one or more nodes go away.

It's an internal check method provided as check command by the ITL (a
package upgrade is required to latest snapshot builds).

http://docs.icinga.org/icinga2/snapshot/#cluster-health-check

My two test nodes are icinga2a (config master) and icinga2b (checker).
By killing off the remote node icinga2b, the documentation example
check will switch to critical. The 'authorities' attribute will make
sure that the service check is only executed on node icinga2a.

object Host "icinga2a" inherits "generic-host" {
   services["cluster"] = {
     templates = [ "generic-service" ],
     check_interval = 1m,
     check_command = "cluster",
     authorities = [ "icinga2a" ]
   },
}

You'll also recognize that the 'icinga' self stats check contains more
performance data values (*execution_time, states counters, etc) in
order to satisfy the ordinary icingastats output performance graphers.

http://docs.icinga.org/icinga2/snapshot/#itl-icinga

Have fun playing with Icinga 2 :)


Carlos



--
DI (FH) Michael Friedrich

michael.friedr...@gmail.com  || icinga open source monitoring
https://twitter.com/dnsmichi || lead core developer
dnsmi...@jabber.ccc.de       || https://www.icinga.org/team
irc.freenode.net/icinga      || dnsmichi
_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

Reply via email to