Re: [prometheus-users] Re: Alerts are getting auto resolved automatically

Stuart Clark Tue, 05 Jul 2022 00:49:10 -0700

Two alerts suggests that the two instances aren't talking to each other. How 
have you configured them? Does the UI show the "other" instance?


On 5 July 2022 08:34:45 BST, Venkatraman Natarajan <[email protected]> wrote:
>Thanks Brian. I have used last_over_time query in our expression instead of
>turning off auto-resolved.
>
>Also, we have two alert managers in our environment. Both are up and
>running. But Nowadays, we are getting two alerts from two alert managers.
>Could you please help me to sort this issue as well.?
>
>Please find the alert manager configuration.
>
>  alertmanager0:
>    image: prom/alertmanager
>    container_name: alertmanager0
>    user: rootuser
>    volumes:
>      - ../data:/data
>      - ../config/alertmanager.yml:/etc/alertmanager/alertmanager.yml
>    command:
>      - '--config.file=/etc/alertmanager/alertmanager.yml'
>      - '--storage.path=/data/alert0'
>      - '--cluster.listen-address=0.0.0.0:6783'
>      - '--cluster.peer={{ IP Address }}:6783'
>      - '--cluster.peer={{ IP Address }}:6783'
>    restart: unless-stopped
>    logging:
>      driver: "json-file"
>      options:
>        max-size: "10m"
>        max-file: "2"
>    ports:
>      - 9093:9093
>      - 6783:6783
>    networks:
>      - network
>
>Regards,
>Venkatraman N
>
>
>
>On Sat, Jun 25, 2022 at 9:05 PM Brian Candler <[email protected]> wrote:
>
>> If probe_success becomes non-zero, even for a single evaluation interval,
>> then the alert will be immediately resolved.  There is no delay on
>> resolving, like there is for pending->firing ("for: 5m").
>>
>> I suggest you enter the alerting expression, e.g. "probe_success == 0",
>> into the PromQL web interface (query browser), and switch to Graph view,
>> and zoom in.  If you see any gaps in the graph, then the alert was resolved
>> at that instant.
>>
>> Conversely, use the query
>> probe_success{instance="xxx"} != 0
>> to look at a particular timeseries, as identified by the label9s), and see
>> if there are any dots shown where the label is non-zero.
>>
>> To make your alerts more robust you may need to use queries with range
>> vectors, e.g. min_over_time(foo[5m]) or max_over_time(foo[5m]) or whatever.
>>
>> As a general rule though: you should consider carefully whether you want
>> to send *any* notification for resolved alerts.  Personally, I have
>> switched to send_resolved = false.  There are some good explanations here:
>>
>> https://www.robustperception.io/running-into-burning-buildings-because-the-fire-alarm-stopped
>>
>> https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/
>>
>> You don't want to build a culture where people ignore alerts because the
>> alert cleared itself - or is expected to clear itself.
>>
>> You want the alert condition to trigger a *process*, which is an
>> investigation of *why* the alert happened, *what* caused it, whether the
>> underlying cause has been fixed, and whether the alerting rule itself was
>> wrong.  When all that has been investigated, manually close the ticket.
>> The fact that the alert has gone below threshold doesn't mean that this
>> work no longer needs to be done.
>>
>> On Saturday, 25 June 2022 at 13:27:22 UTC+1 [email protected] wrote:
>>
>>> Hi Team,
>>>
>>> We are having two prometheus and two alert managers in separate VMs as
>>> containers.
>>>
>>> Alerts are getting auto resolved even though the issues are there as per
>>> threshold.
>>>
>>> For example, if we have an alert rule called probe_success == 0 means it
>>> is triggering an alert but after sometime the alert gets auto-resolved
>>> because we have enabled send_resolved = true. But probe_success == 0 still
>>> there so we don't want to auto resolve the alerts.
>>>
>>> Could you please help us on this.?
>>>
>>> Thanks,
>>> Venkatraman N
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Prometheus Users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/prometheus-users/68bff458-ee79-42ce-bafb-facd239e26aen%40googlegroups.com
>> <https://groups.google.com/d/msgid/prometheus-users/68bff458-ee79-42ce-bafb-facd239e26aen%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>-- 
>You received this message because you are subscribed to the Google Groups 
>"Prometheus Users" group.
>To unsubscribe from this group and stop receiving emails from it, send an 
>email to [email protected].
>To view this discussion on the web visit 
>https://groups.google.com/d/msgid/prometheus-users/CANSgTEbTrr7Jjf_XwD0J8wgMAdiLg9g_MmWDK%3DpgkTjwMA5YZA%40mail.gmail.com.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/D6D0720C-85D3-4B40-B7F7-48C2FE1F86F6%40Jahingo.com.

Re: [prometheus-users] Re: Alerts are getting auto resolved automatically

Reply via email to