On Tuesday, 28 November 2023 at 04:15:41 UTC Chris Siebenmann wrote:

The Blackbox exporter is a bit tricky to understand in relation to up{}, 
because unlike many exporters you create multiple scrape targets against 
(or through) the same exporter. This generally means you want to ignore 
the up{} metric for any particular blackbox probe and instead scrape 
Blackbox's metric endpoint and pay attention to its up{} (for alerts, 
for example).


I think that's worded in a misleading way.

Blackbox exporter does have a /metrics endpoint, but this is only for 
metrics internal to the operation of blackbox_exporter itself (e.g. memory 
stats, software version). You don't need to scrape this, but it gives you a 
little bit of extra info about how your exporter is performing.

Blackbox exporter's main interface is the /probe endpoint, where you tell 
it to run individual tests: /probe?target=xxx&module=yyy

The 'up' metric is generated by Prometheus itself, and only tells you that 
it was successfully able to communicate with the exporter and get some 
results (without a 4xx / 5xx error for example).  So it's correct to say 
that you're not interested in the 'up' metric for scrapes to /probe, since 
it will always be 1 unless blackbox_exporter itself is badly broken, and 
you're interested in probe_success instead.

This is pretty easy to arrange in alerting rules. Here's a starting point:

groups:
- name: UpDown
  rules:
  - alert: UpDown
    expr: up == 0
    for: 3m
    keep_firing_for: 3m
    labels:
      severity: critical
    annotations:
      summary: 'Scrape failed: host is down or scrape endpoint 
down/unreachable'
- name: BlackboxRules
  rules:
  - alert: ProbeFail
    expr: probe_success == 0
    for: 3m
    keep_firing_for: 3m
    labels:
      severity: critical
    annotations:
      description: |
        {{ $labels.instance }} ({{ $labels.module }}) probe is failing
      summary: Probed service is down

For Grafana I'd probably just make two dashboards, but if you really want a 
grand summary of all "problems" then you can simply use a PromQL expression 
like this:

    up == 0 or probe_success == 0

The "or" operator 
<https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators>
 
in PromQL is not a boolean: it's more like a set union operator.  It will 
give you all the values of the "up" vector where the value is 0, along with 
all values of the "probe_success" vector where the value is 0 (except for 
values of probe_success == 0 which have *exactly* the same labels as up == 
0, but those are unlikely anyway)

The consumer of this query is going to see a mixture of up{...} and 
probe_success{...} metrics, all with value 0.

 there are other multi-target 
indirect exporters like Blackbox. I believe that the SNMP exporter is 
another one where you often have one exporter separately scraping a lot 
of targets, and each target will have its own up{} metric that you 
probably want to ignore.)


The first part of that is correct: SNMP exporter uses 
/snmp?target=xxx&module=yyy&auth=zzz.

But the second part is wrong: if SNMP exporter fails to talk to the target 
then it returns an empty scrape with a 4xx/5xx error code, which prometheus 
turns into up==0.  So you definitely *do* want to alert on up==0 in this 
case, as that's how you detect a device which is failing to respond to SNMP.

 


In our environment, it's useful for us to have a granular view of what 
has failed. That a device has stopped pinging is a different issue than 
its node_exporter not being up, so our dashboards (and alerts) reflect 
that.


I agree with that. Different metrics inherently have different meanings, 
and although 'up' and 'probe_success' have similar 0/1 semantics, there's 
other information you can get from blackbox_exporter when probe_success==0 
which can tell you more about the nature of the problem (e.g. failure to 
connect, failure to resolve DNS name, TLS certificate validation failure 
etc)

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/adf18a14-269f-41a3-b60f-d8c7a49858ean%40googlegroups.com.

Reply via email to