Re: [icinga-users] Notification behavior when host is temporarily DOWN

Michael Friedrich Mon, 13 Jan 2014 13:55:02 -0800

On 12.12.2013 13:32, Gerd Radecke wrote:

Hi everybody,


I'm looking at an issue with notifications and I'm unsure whether this
is working as designed or not.

I'm getting service notifications when a service that has been in that
state for a long time changes from WARNING; HARD to CRITICAL;HARD
after one check because of a check timeout.
Three seconds later, the host check returns with DOWN, SOFT, yet only
once, so the host never gets to DOWN, HARD.

I thought that if the host is down, no service notifications will be
sent.  http://docs.icinga.org/latest/en/checkscheduling.html#hostcheckscheduling
actually states that "when Icinga is check [sic!] the status of a
host, it holds off on doing anything else"  - so I would expect it to
also not send the service notification I'm seeing until it's sure what
the host status is :/

The Log with comments is here:

# 1. Status - of dbserver;Disk_E is WARNING;HARD and has been so for a
while(also see the last line in this log)


Dec 11 23:05:34 icinga_server icinga: SERVICE ALERT:
db_server;Disk_E;CRITICAL;HARD;3;CRITICAL - Socket timeout after 10
seconds
# 2. When we get a Critical for Disk_E because of the timeout, the
status goes to Critical, HARD which conforms to
http://docs.icinga.org/latest/en/statetypes.html - 5.8.4 and 5.8.5

# 3. If I understand
http://docs.icinga.org/latest/en/checkscheduling.html#hostcheckscheduling
correctly, on every service state change, icinga will do a check of
the host, to see if its status changed as well.

There's a cache involved, not immediately forcing an actual host checkitsself.

http://docs.icinga.org/latest/en/configmain.html#configmain-cached_host_check_horizon

That said, if a previous failing service check triggered a host check,resulting in an UP state, it could happen that the service checkafterwards in that given check horizon will result in a "host is assumedUP, please notify the service".

  So in this case, a
host check should be performed before any further action is taken.
What actually happens is that the result is processed and a service
notification is send out immediately
Dec 11 23:05:34 icinga_server icinga: SERVICE NOTIFICATION:
prio1;db_server;Disk_E;CRITICAL;notify_service_email_24x7;CRITICAL -
Socket timeout after 10 seconds

Any debug logs for specifically the host, and all surrounding servicechecks? (level checks/events or higher, verbosity 2)


# 4. Only a few seconds afterswards does icinga show new results for
the host state, but the are still SOFT.
Dec 11 23:05:37 icinga_server icinga: HOST ALERT:
db_server;DOWN;SOFT;1;CRITICAL - Host Unreachable (172.16.28.132)


max_check_attempts of that host? what state (log entry) did it have before?


# 5. The host is reachable again.
Dec 11 23:08:44 icinga_server icinga: HOST ALERT:
db_server;UP;SOFT;2;PING OK - Packet loss = 0%, RTA = 45.00 ms

# 6. Service status goes back to Warning.
Dec 11 23:20:24 icinga_server icinga: SERVICE ALERT:
db_server;Disk_E;WARNING;HARD;3;e:\ - total: 180.00 Gb - used: 163.08
Gb (91%) - free 16.91 Gb (9%)


So I'm wondering: is sending notifications on this described change
from Warning -> Critical
a) the correct behavior or
b) should icinga not send this service notification because the host
is DOWN and the service state can therefore not be determined.

Depends on the host state in that specific situation, and if it changed/ was cached.



--
DI (FH) Michael Friedrich

mail:     michael.friedr...@gmail.com
twitter:  https://twitter.com/dnsmichi
jabber:   dnsmi...@jabber.ccc.de
irc:      irc.freenode.net/icinga dnsmichi

icinga open source monitoring
position: lead core developer
url:      https://www.icinga.org

_______________________________________________
icinga-users mailing list
icinga-users@lists.icinga.org
https://lists.icinga.org/mailman/listinfo/icinga-users

Re: [icinga-users] Notification behavior when host is temporarily DOWN

Reply via email to