> Not sure if I'm right, but I think if one places both rules in the same group (and I think even the order shouldn't matter?), then the original: > expr: min_over_time(up[5m]) == 0 unless max_over_time(up[5m]) == 0 > for: 5m > with 5m being the "for:"-time of the long-alert should be guaranteed to work... in the sense that if the above doesn't fire... the long-alert > does.
It depends on the exact semantics of "for". e.g. take a simple case of 1 minute rule evaluation interval. If you apply "for: 1m" then I guess that means the alert must be firing for two successive evaluations (otherwise, "for: 1m" would have no effect). If so, then "for: 5m" means it must be firing for six successive evaluations. But up[5m] only looks at samples wholly contained within a 5 minute window, and therefore will normally only look at 5 samples. (If there is jitter in the sampling time, then occasionally it might look at 4 or 6 samples) If what I've written above is correct (and it may well not be!), then expr: up == 0 for: 5m will fire if "up" is zero for 6 cycles, whereas ... unless max_over_time(up[5m]) will suppress an alert if "up" is zero for (usually) 5 cycles. If you want to get to the bottom of this with certainty, you can write unit tests <https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/> that try out these scenarios. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/12e68a80-7d90-4e91-838a-bae6a21ca3b1n%40googlegroups.com.

