Package: mon-contrib
Version: 1.0+dfsg-3
Severity: normal
Tags: patch
The attached patch gives options for the minimum number of alerts to be sent
and the
minimum failure duration before a failure on a remote system is considered to
be a
local failure. If you have multiple servers that are all capable of
independently
notifying the sysadmin then remote.monitor is good to make sure that they are
all
operating correctly. In the normal case the remote server notifies you and if
that
doesn't work then the master server notifies you some time later.
--- /usr/lib/mon/mon.d/remote.monitor 2014-07-03 11:31:07.000000000 +1000
+++ remote.monitor 2016-05-13 21:25:13.117297577 +1000
@@ -12,6 +12,11 @@
# return for each failed mon server the list of the
# failed. Like : host1([g1:s1|s3][g4:s5]) ...
#
+# --alerts_sent : the number of alerts that should be sent before we
consider it a
+# problem
+#
+# --failure_duration : the minimum duration of a recorded problem before we
alert
+#
# --bigsummary : flag to extend the summary of this monitor
# return for each failed mon server the list of the
# failed. Like : host1([g1:s1{sum}|s3{sum}][g4:s5{sum}])
...
@@ -47,6 +52,8 @@
"timeout|t:i" => \$timeout,
"summary" => \$summary,
"bigsummary" => \$bigsummary,
+ "failure_duration:i" => \$min_failure_duration,
+ "alerts_sent:i" => \$min_alerts_sent,
"debug|d" => \$debug,
"help|h" => \$help,
"restrict|r:s" => \$restrict,
@@ -61,6 +68,8 @@
$port = ($port) ? $port : "2583";
$timeout = ($timeout) ? $timeout : "10";
$summary = ($summary) ? $summary : $bigsummary;
+$min_failure_duration = ($min_failure_duration) ? $min_failure_duration : 0;
+$min_alerts_sent = ($min_alerts_sent) ? $min_alerts_sent : 1;
($restrict) and ($only_watch,$only_service) = split( /:/, ($restrict) );
@failures = ();
@@ -177,6 +186,11 @@
my($opstatus);
next if ( ($only_service) && !( $service eq
($only_service) ));
+
+ my $alerts_sent = $s{$watch}{$service}{alerts_sent};
+ my $failure_duration =
$s{$watch}{$service}{failure_duration};
+ next if ($alerts_sent < $min_alerts_sent);
+ next if ($failure_duration < $min_failure_duration);
# state service recuperation
$opstatus = $s{$watch}{$service}{opstatus};
($debug) and print "$watch $service opstatus=$opstatus\n";
@@ -193,7 +207,7 @@
# service failed and not disabled
$hosterr++;
$watcherr++;
- ($debug) and print "Watch $watch service $service
failed\n";
+ ($debug) and print "Watch $watch service $service failed
with $alerts_sent alerts for $failure_duration seconds\n";
push (@failures, ${host}) unless
(defined($failuresDetails{${host}}));
$failuresDetails{${host}} .=
"Watch $watch, service $service, failed ".
-- System Information:
Debian Release: stretch/sid
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)
Kernel: Linux 4.5.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_AU.UTF-8, LC_CTYPE=en_AU.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Init: systemd (via /run/systemd/system)
Versions of packages mon-contrib depends on:
ii mon 1.2.0-9
mon-contrib recommends no packages.
mon-contrib suggests no packages.
-- no debconf information