Hey,

Having an interesting issue with Prom and Alert manager, im 99% sure its a 
config issue, but having a hard time figuring it out.

We have a group of polls that use the blackbox exporter to ping some 
endpoints. It pings once every 30 seconds. 

The rule looks like this

- name: blackbox.rules.icmpFailed
  rules:
    - alert: BlackboxIcmpFailed
      expr: probe_icmp_duration_seconds == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: Ping to Device Failed.

And our alert manager config look like this

spec:
  route:
    groupBy: [ 'instance','severity' ]
    groupWait: 30s
    groupInterval: 5m
    repeatInterval: 12h

Now here is what I am seeing.

If we have a single ping failure then an alert message is sent to slack, 
which immediately clears on the next 5 min cycle. 

I thought having the  "for: 5m" should mean that an alert is ONLY sent if 
that condition has been seen for 5 mins consecutively. As you can imagine 
this leads to lots of angst :D

Any ideas? 

-- 
This email contains information which is private and confidential, all 
commercial rights to the details included are owned exclusively by Nscale. 
Disclosure without written permission is strictly prohibited. If you have 
received this email in error, please inform me as soon as possible.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/prometheus-users/a7d57b2c-2b27-4afb-b079-a7c0d27ecb1bn%40googlegroups.com.

Reply via email to