Hey, Having an interesting issue with Prom and Alert manager, im 99% sure its a config issue, but having a hard time figuring it out.
We have a group of polls that use the blackbox exporter to ping some endpoints. It pings once every 30 seconds. The rule looks like this - name: blackbox.rules.icmpFailed rules: - alert: BlackboxIcmpFailed expr: probe_icmp_duration_seconds == 0 for: 5m labels: severity: critical annotations: summary: Ping to Device Failed. And our alert manager config look like this spec: route: groupBy: [ 'instance','severity' ] groupWait: 30s groupInterval: 5m repeatInterval: 12h Now here is what I am seeing. If we have a single ping failure then an alert message is sent to slack, which immediately clears on the next 5 min cycle. I thought having the "for: 5m" should mean that an alert is ONLY sent if that condition has been seen for 5 mins consecutively. As you can imagine this leads to lots of angst :D Any ideas? -- This email contains information which is private and confidential, all commercial rights to the details included are owned exclusively by Nscale. Disclosure without written permission is strictly prohibited. If you have received this email in error, please inform me as soon as possible. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/a7d57b2c-2b27-4afb-b079-a7c0d27ecb1bn%40googlegroups.com.