I think the starting point is to look at your alerting expressions, how they change over time in the PromQL GUI (graph view), and the synthetic metric "ALERTS <https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#inspecting-alerts-during-runtime> ".
If an alert expression drops out for even a single rule evaluation interval, then the alert is immediately resolved; and then it will re-fire on the next cycle (or after the "for:" period if present) There is a change in prometheus-2.42.0 <https://github.com/prometheus/prometheus/releases/tag/v2.42.0> which may help address this: - [FEATURE] Add 'keep_firing_for' field to alerting rules. #11827 <https://github.com/prometheus/prometheus/pull/11827> On Thursday, 9 March 2023 at 21:17:24 UTC Russ Robinson wrote: > I have alertmanager configured to send "critical" alerts to Pagerduty > over Events v2 api. If the prometheus rule has an alert that lasts longer > than 20 minutes or so; the pagerduty alert will be resolved and then > re-triggers a new event. > > I have tried disabling grouping (with "group_by [...]"). The pagerduty > alert's log just says: "Resolved through the integration API.". > > However, the alert still shows in Alertmanager. In addition, I have > messages going to slack. The alert message shows up there; but never a > resolved message either. > > Any ideas why alertmanager would close/resolve the pagerduty incident > and then re-trigger/open one again? > > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b9c8b139-36b4-4087-90d0-67ffe5030b86n%40googlegroups.com.

