*blackbox exporter config:*
icmp:
prober: icmp
icmp:
preferred_ip_protocol: "ip4"
tcp:
prober: tcp
timeout: 5s
tcp:
preferred_ip_protocol: "ip4"
*Prometheus scrape config:*
global:
scrape_interval: 60s
evaluation_interval: 60s
- job_name: PING
metrics_path: /probe
params:
module: [icmp]
file_sd_configs:
- files:
- '/etc/prometheus/targets/'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}'
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: prometheus-blackbox-exporter:9115
- job_name: SSH
metrics_path: /probe
params:
module: [ssh_banner]
file_sd_configs:
- files:
- '/etc/prometheus/targets/'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
regex: '([^:]+)(:[0-9]+)?'
replacement: '${1}:22'
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: prometheus-blackbox-exporter:9115
*Alert rules:*
- alert: TargetDown
expr: probe_success == 0
for: 5s
labels:
severity: critical
annotations:
description: Service {{ $labels.instance }} is unreachable.
value: DOWN ({{ $value }})
summary: "Target {{ $labels.instance }} is down."
*Alert manager config:*
config.yml: |-
global:
resolve_timeout: 5m
smtp_smarthost: mail
smtp_from: alertmanager
smtp_require_tls: false
route:
receiver: email-me
group_by: [instance, alertname, job]
group_wait: 45s
group_interval: 5m
repeat_interval: 24h
receivers:
- name: email-me
email_configs:
- to: alert
send_resolved: true
On Wednesday, April 20, 2022 at 8:29:10 PM UTC+8 Brian Candler wrote:
> blackbox_exporter monitoring TCP ports (e.g. for SSH) and ICMP (ping)
> works fine.
>
> "but black box exporter detect the recover behavior after about 5mins"
>
> Black box exporter only performs a single test when you scrape it. It
> does not by itself do any recovery detection. The problem is therefore
> most likely with your prometheus scrape config or your alertmanager config.
>
> If you're having a problem, you'll need to be more specific:
> * show your blackbox_exporter config, your prometheus scrape config which
> scrapes it, your alerting rules, and your alertmanager config (if using
> alertmanager)
> * describe more clearly the behaviour you're seeing, and what you expected
> to see. (For example, are you waiting for a "recovery" E-mail from
> alertmanager?)
>
> "And after the IP table is recovered, the alert for Ping can be cleared
> after about 20mins, but SSH is still there."
>
> Either SSH is working and reachable, or it is not. You can check the
> results of blackbox_exporter tests by hand using curl, and also get
> additional debugging information, like this:
>
> curl -g 'http://127.0.0.1:9115/probe?module=xxx&target=yyyy&debug=true'
>
> Here is an example:
>
> # *curl -g
> 'http://localhost:9115/probe?module=icmp&target=1.2.3.4&debug=true
> <http://localhost:9115/probe?module=icmp&target=1.2.3.4&debug=true>'*
> Logs for the probe:
> ts=2022-04-20T12:25:11.587855449Z caller=main.go:320 module=icmp
> target=1.2.3.4 level=info msg="Beginning probe" probe=icmp timeout_seconds=3
> ts=2022-04-20T12:25:11.588014456Z caller=icmp.go:91 module=icmp
> target=1.2.3.4 level=info msg="Resolving target address" ip_protocol=ip6
> ts=2022-04-20T12:25:11.588065658Z caller=icmp.go:91 module=icmp
> target=1.2.3.4 level=info msg="Resolving target address" ip_protocol=ip4
> ts=2022-04-20T12:25:11.588098688Z caller=icmp.go:91 module=icmp
> target=1.2.3.4 level=info msg="Resolved target address" ip=1.2.3.4
> ts=2022-04-20T12:25:11.588133368Z caller=main.go:130 module=icmp
> target=1.2.3.4 level=info msg="Creating socket"
> ts=2022-04-20T12:25:11.588188673Z caller=main.go:130 module=icmp
> target=1.2.3.4 level=debug msg="Unable to do unprivileged listen on socket,
> will attempt privileged" err="socket: permission denied"
> ts=2022-04-20T12:25:11.58829848Z caller=main.go:130 module=icmp
> target=1.2.3.4 level=info msg="Creating ICMP packet" seq=24581 id=190
> ts=2022-04-20T12:25:11.588348917Z caller=main.go:130 module=icmp
> target=1.2.3.4 level=info msg="Writing out packet"
> ts=2022-04-20T12:25:11.588470176Z caller=main.go:130 module=icmp
> target=1.2.3.4 level=info msg="Waiting for reply packets"
> ts=2022-04-20T12:25:14.588761946Z caller=main.go:130 module=icmp
> target=1.2.3.4 level=debug msg="Cannot get TTL from the received packet.
> 'probe_icmp_reply_hop_limit' will be missing."
> ts=2022-04-20T12:25:14.588979317Z caller=main.go:130 module=icmp
> target=1.2.3.4 level=warn msg="Timeout reading from socket" err="read ip
> 0.0.0.0: raw-read ip4 0.0.0.0: i/o timeout"
> ts=2022-04-20T12:25:14.589247538Z caller=main.go:320 module=icmp
> target=1.2.3.4 level=error msg="Probe failed" duration_seconds=3.001307309
>
>
>
> Metrics that would have been returned:
> # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns
> lookup in seconds
> # TYPE probe_dns_lookup_time_seconds gauge
> probe_dns_lookup_time_seconds 0.000116077
> # HELP probe_duration_seconds Returns how long the probe took to complete
> in seconds
> # TYPE probe_duration_seconds gauge
> probe_duration_seconds 3.001307309
> # HELP probe_icmp_duration_seconds Duration of icmp request by phase
> # TYPE probe_icmp_duration_seconds gauge
> probe_icmp_duration_seconds{phase="resolve"} 0.000116077
> probe_icmp_duration_seconds{phase="rtt"} 0
> probe_icmp_duration_seconds{phase="setup"} 0.000212886
> # HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to
> detect if the IP address changes.
> # TYPE probe_ip_addr_hash gauge
> probe_ip_addr_hash 3.268949123e+09
> # HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
> # TYPE probe_ip_protocol gauge
> probe_ip_protocol 4
> # HELP probe_success Displays whether or not the probe was a success
> # TYPE probe_success gauge
> probe_success 0
>
>
>
> Module configuration:
> prober: icmp
> timeout: 3s
> http:
> ip_protocol_fallback: true
> follow_redirects: true
> tcp:
> ip_protocol_fallback: true
> icmp:
> ip_protocol_fallback: true
> dns:
> ip_protocol_fallback: true
>
>
> Look at "probe_success" for the overall result.
>
> You can also use the PromQL browser in the Prometheus web interface: enter
> "probe_success" as the query and look at the graph tab. You'll see the
> history of your blackbox exporter probes.
>
> On Wednesday, 20 April 2022 at 12:37:17 UTC+1 [email protected] wrote:
>
>> Hi guys,
>>
>> We are using black box exporter to monitor ssh and ping.
>>
>> For ssh, (we monitor the port 22) if we stop sshd service, actually the
>> service will be auto-recovered, but black box exporter detect the recover
>> behavior after about 5mins.
>>
>> For ping, we use icmp module to monitor system ping, we deleted the IP
>> tables, then Prometheus triggered 2 alerts, one is SSH is failed, the other
>> is Ping is failed. And after the IP table is recovered, the alert for Ping
>> can be cleared after about 20mins, but SSH is still there.
>>
>> So it is a good approach to use blackbox exporter to monitor SSH and PING?
>>
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/35d018ad-19d6-45f4-871c-0c82792d33c2n%40googlegroups.com.