I think the starting point is to look at your alerting expressions, how 
they change over time in the PromQL GUI (graph view), and the synthetic 
metric "ALERTS 
<https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#inspecting-alerts-during-runtime>
".

If an alert expression drops out for even a single rule evaluation 
interval, then the alert is immediately resolved; and then it will re-fire 
on the next cycle (or after the "for:" period if present)

There is a change in prometheus-2.42.0 
<https://github.com/prometheus/prometheus/releases/tag/v2.42.0> which may 
help address this:

   - [FEATURE] Add 'keep_firing_for' field to alerting rules. #11827 
   <https://github.com/prometheus/prometheus/pull/11827>


On Thursday, 9 March 2023 at 21:17:24 UTC Russ Robinson wrote:

>   I have alertmanager configured to send "critical" alerts to Pagerduty 
> over Events v2 api.  If the prometheus rule has an alert that lasts longer 
> than 20 minutes or so; the pagerduty alert will be resolved and then 
> re-triggers a new event.
>
>   I have tried disabling grouping (with "group_by [...]").  The pagerduty 
> alert's log just says: "Resolved through the integration API.".
>
>   However, the alert still shows in Alertmanager.  In addition, I have 
> messages going to slack.  The alert message shows up there; but never a 
> resolved message either.
>
>   Any ideas why alertmanager would close/resolve the pagerduty incident 
> and then re-trigger/open one again?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/b9c8b139-36b4-4087-90d0-67ffe5030b86n%40googlegroups.com.

Reply via email to