[prometheus-users] Re: black box exporter monitoring SSH and PING

Brian Candler Thu, 21 Apr 2022 01:51:23 -0700

On Thursday, 21 April 2022 at 09:22:32 UTC+1 [email protected] wrote:

> *blackbox exporter config:*
> icmp:
>         prober: icmp
>         icmp:
>           preferred_ip_protocol: "ip4"
> tcp:
>         prober: tcp
>         timeout: 5s
>         tcp:
>           preferred_ip_protocol: "ip4"
>
> *Prometheus scrape config:*
>
...


>       - job_name: SSH
>         metrics_path: /probe
>         params:
>        *   module: [ssh_banner]*
>         file_sd_configs:
>         - files:
>           - '/etc/prometheus/targets/'
>         relabel_configs:
>           - source_labels: [__address__]
>             target_label: __param_target
>             regex: '([^:]+)(:[0-9]+)?'
>             replacement: '${1}:22'
>           - source_labels: [__param_target]
>             target_label: instance
>           - target_label: __address__
>             replacement: prometheus-blackbox-exporter:9115
>

In your scrape job you are setting parameter module=ssh_banner, but you 
have not defined a module called "ssh_banner" in your blackbox exporter 
config.

Therefore it will always result in a failure.  Test like this:
*curl -g 
'http://prometheus-blackbox-exporter:9115/probe?module=ssh_banner&target=blah.example.com&debug=true'*

 

> *Alert rules:*
> - alert: TargetDown
>           expr: probe_success == 0
>           for: 5s
>           labels:
>             severity: critical
>           annotations:
>             description: Service {{ $labels.instance }} is unreachable.
>             value: DOWN ({{ $value }})
>             summary: "Target {{ $labels.instance }} is down."
>
>
You can leave out "for: 5s" since you're only scraping and evaluating rules 
every 60s.

If you don't want an immediate alert in the case of a single probe failure 
(like a single dropped packet), then set "for: 1m" or "for: 2m" as 
required.  This will then only alert if the alert is continuously present 
for that duration.

 

> *Alert manager config:*
> ...
>     - name: email-me
>       email_configs:
>       - to: alert
>         send_resolved: true
>
>
In your original post you said "but black box exporter detect the recover 
behavior after about 5mins". Are you talking about when you receive the 
"send_resolved" message from alertmanager?

There are various delays which can occur between prometheus making an alert 
and alertmanager sending it, and also with prometheus withdrawing an alert 
and alertmanager sending a resolved message.

If I understand correctly: Prometheus doesn't explicitly "resolve" an 
alert, rather it just stops sending that alert.  The alert comes with an 
"endsAt" time, which is explained here:
https://github.com/prometheus/prometheus/issues/5277
"3x 
<https://github.com/prometheus/prometheus/blob/f678e27eb62ecf56e2b0bad82345925a4d6162a2/rules/alerting.go#L450>
 the 
greater of the evaluation_interval or resend-delay values"
Since you have an evaluation_interval of 60s, I believe this means there 
will be at least a 3 minute delay between an alert ceasing to fire, and the 
resolved message being sent.

See also:
https://pracucci.com/prometheus-understanding-the-delays-on-alerting.html
https://prometheus.io/docs/alerting/latest/clients/
https://prometheus.io/docs/alerting/latest/configuration/#configuration-file

# ResolveTimeout is the default value used by alertmanager if the alert does
# not include EndsAt, after this time passes it can declare the alert as 
resolved if it has not been updated.
# This has no impact on alerts from Prometheus, as they always include 
EndsAt.
[ resolve_timeout: <duration> 
<https://prometheus.io/docs/alerting/latest/configuration/#duration> | 
default = 5m ]

Really I think you need to separate your problem into two parts:
1. Making sure that blackbox_exporter is probing ICMP and SSH 
successfully.  Check "probe_status" is going to 0 or 1 at the correct 
times.  View the PromQL history of the probe_status metric to confirm 
this.  Ignore alerts.
2. Then look at your alerting configuration, as to exactly when it sends 
messages.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/cd3aa371-e968-4b44-98a5-326c3da1a487n%40googlegroups.com.

[prometheus-users] Re: black box exporter monitoring SSH and PING

Reply via email to