[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

Puneet Singh Wed, 03 Apr 2024 08:34:48 -0700

UPDATE: 
i had a look at 
the https://docs.opsgenie.com/docs/alert-api#add-tags-to-alert
using the following API -


curl -X POST https://api.opsgenie.com/v2/alerts/<alert id>/tags
?identifierType=id -H "Content-Type: application/json" -H "Authorization: 
GenieKey <api key>" -d '{ "tags": 
["host=testserver","instance=testserver123"], "user":"Monitoring Script", 
"note":"Action 
executed via Alert API" }' I was able to append additional tags to the 
existing opsgenie tickets.
[image: photo004.png]
So i think there is no restriction from the Opsgenie's end . The tag update 
issue should be taken care by the Alertmanager's opsgenie plugin.
Not sure that internally how the Alert manager sends tags section 
information to the opsgenie API when new alerts (part of same alert group) 
come in .


On Wednesday 3 April 2024 at 20:31:21 UTC+5:30 mohan garden wrote:

> Thank you for the pointers. I tried - 
> tags: '{{ range .Alerts.Firing }} {{ range .Labels.SortedPairs }}  {{ 
> .Name }}={{ .Value }}, {{ end }} {{end}}'
>
> but did not see any change in the outcome. 
> i see all the tags (alertname, job,instance, ...) -  but only from the 
> first Alert, the tags from the second alert did not show up. 
>
> Is there a way i can see the entire message which alert manager sends out 
> to the Opsgenie? - somewhere in the alertmanager logs or a text file?
> That would help me to understand whether the alert manager is sending  all 
> the tags  and its the Opsgenie which may be dropping those extra tags.
>
> Regards
> CP
>
>
>
> On Wednesday, April 3, 2024 at 5:44:17 PM UTC+5:30 Brian Candler wrote:
>
>> > but i was expecting an additional host=server2 tag on the ticket. 
>>
>> You won't get that, because CommonLabels is exactly how it sounds: those 
>> labels which are common to all the alerts in the group.  If one alert has 
>> instance=server1 and the other has instance=server2, but they're in the 
>> same alert group, then no 'instance' will appear in CommonLabels.
>>
>> The documentation is here:
>> https://prometheus.io/docs/alerting/latest/notifications/
>>
>> It looks like you could iterate over Alerts.Firing then the Labels within 
>> each alert.
>>
>> Alternatively, you could disable grouping and let opsgenie do the 
>> grouping (I don't know opsgenie, so I don't know how good a job it would do 
>> of that)
>>
>>
>> On Wednesday 3 April 2024 at 09:11:24 UTC+1 mohan garden wrote:
>>
>>> *correction: 
>>> *Scenario2: *While server1 trigger is active, a second server ( say 
>>> server2)'s local disk usage reaches 50%,
>>>
>>> i see that the already open Opsgenie ticket's details gets updated as:
>>>
>>> ticket header name:  local disk usage reached 50%
>>> ticket description:  space on /var file system at server1:9100 server = 
>>> 82%."
>>>                                  space on /var file system at 
>>> server2:9100 server = 80%."
>>> ticket tags: criteria: overuse , team: support, severity: critical, 
>>> infra,monitor,host=server1
>>>
>>> [image: photo003.png]
>>>
>>>
>>>
>>> On Wednesday, April 3, 2024 at 1:37:12 PM UTC+5:30 mohan garden wrote:
>>>
>>>> Hi Brian, 
>>>> Thank you for the response, Here are some more details, hope this will 
>>>> help you in gaining more understanding into the configuration and method i 
>>>> am using to generate tags :
>>>>
>>>>
>>>> 1. We collect data from the node exporter, and have created some rules 
>>>> around the collected data. Here is one example - 
>>>>     - alert: "Local Disk usage has reached 50%"
>>>>       expr: (round( 
>>>> node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*",}
>>>>  
>>>> / 
>>>> node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>>>>  
>>>> * 100  ,0.1) >= y ) and (round( 
>>>> node_filesystem_avail_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>>>>  
>>>> / 
>>>> node_filesystem_size_bytes{mountpoint=~"/dev.*|/sys*|/|/home|/tmp|/var.*|/boot.*"}
>>>>  
>>>> * 100  ,0.1) <= z )
>>>>       for: 5m
>>>>       labels:
>>>>         criteria: overuse
>>>>         severity: critical
>>>>         team: support
>>>>       annotations:
>>>>         summary: "{{ $labels.instance }} 's  ({{ $labels.device }}) has 
>>>> low space."
>>>>         description: "space on {{ $labels.mountpoint }} file system at 
>>>> {{ $labels.instance }} server = {{ $value }}%."
>>>>
>>>> 2. at the alert manager , we have created notification rules to notify 
>>>> in case the aforementioned condition occurs:
>>>>
>>>>   smtp_from: 'ser...@example.com'
>>>>   smtp_require_tls: false
>>>>   smtp_smarthost: 'ser...@example.com:25 
>>>> <http://serv...@example.com:25>'
>>>>
>>>> templates:
>>>>   - /home/ALERTMANAGER/conf/template/*.tmpl
>>>>
>>>> route:
>>>>   group_wait: 5m
>>>>   group_interval: 2h
>>>>   repeat_interval: 5h
>>>>   receiver: admin
>>>>   routes:
>>>>   - match_re:
>>>>       alertname: ".*Local Disk usage has reached .*%"
>>>>     receiver: admin
>>>>     routes:
>>>>     - match:
>>>>         criteria: overuse
>>>>         severity: critical
>>>>         team: support
>>>>       receiver: mailsupport
>>>>       continue: true
>>>>     - match:
>>>>         criteria: overuse
>>>>         team: support
>>>>         severity: critical
>>>>         receiver: opsgeniesupport
>>>>
>>>> receivers:
>>>>   - name: opsgeniesupport
>>>>     opsgenie_configs:
>>>>     - api_key: XYZ
>>>>       api_url: https://api.opsgenie.com
>>>>       message: '{{ .CommonLabels.alertname }}'
>>>>       description: "{{ range .Alerts }}{{ .Annotations.description 
>>>> }}\n\r{{ end }}"
>>>>       tags: '{{ range $k, $v := .CommonLabels}}{{ if or (eq $k 
>>>> "criteria")  (eq $k "severity") (eq $k "team") }}{{$k}}={{$v}},{{ else if 
>>>> eq $k "instance" }}{{ reReplaceAll "(.+):(.+)" "host=$1" $v 
>>>> }},{{end}}{{end}},infra,monitor'
>>>>       priority: 'P1'
>>>>       update_alerts: true
>>>>       send_resolved: true
>>>> ...
>>>> So you can see that i derive a  tag host=<hostname> from the instance 
>>>> label.
>>>>
>>>>
>>>> *Scenario1: *When server1 's local disk usage reaches 50%, i see that 
>>>> Opsgenie ticket is created having:
>>>> Opsgenie Ticket metadata: 
>>>> ticket header name:  local disk usage reached 50%
>>>> ticket description:  space on /var file system at server1:9100 server 
>>>> = 82%."
>>>> ticket tags: criteria: overuse , team: support, severity: critical, 
>>>> infra,monitor,host=server1
>>>>
>>>> so everything works as expected, no issues with Scenario1.
>>>>
>>>>
>>>> *Scenario2: *While server1 trigger is active, a second server ( say 
>>>> server2)'s local disk usage reaches 50%,
>>>>
>>>> i see that Opsgenie tickets are getting updated as:
>>>> ticket header name:  local disk usage reached 50%
>>>> ticket description:  space on /var file system at server1:9100 server 
>>>> = 82%."
>>>> ticket description:  space on /var file system at server2:9100 server 
>>>> = 80%."
>>>> ticket tags: criteria: overuse , team: support, severity: critical, 
>>>> infra,monitor,host=server1
>>>>
>>>>
>>>> but i was expecting an additional host=server2 tag on the ticket.  
>>>> in Summary - i see updated description , but unable to see updated tags.
>>>>
>>>> in tags section of the alertmanager - opsgenie integration 
>>>> configuration , i had tried iterating over Alerts and CommonLabels, but i 
>>>> was unable to add  additional host=server2 tag .
>>>> {{ range $idx, $alert := .Alerts}}{{range $k, $v := $alert.Labels 
>>>> }}{{$k}}={{$v}},{{end}}{{end}},test=test
>>>> {{ range $k, $v := .CommonLabels}}....{{end}}
>>>>
>>>>
>>>> At the moment, i am not sure that what is potentially preventing the 
>>>> update of tags on the opsgenie tickets.
>>>> If i can get some clarity on the fact that if the configurations i have 
>>>> for  alertmanager are good enough, then i can look at the opsgenie 
>>>> configurations.
>>>>
>>>>
>>>> Please advice.
>>>>
>>>>
>>>> Regards
>>>> CP
>>>>
>>>>
>>>> On Tuesday, April 2, 2024 at 10:46:36 PM UTC+5:30 Brian Candler wrote:
>>>>
>>>>> FYI, those images are unreadable - copy-pasted text would be much 
>>>>> better.
>>>>>
>>>>> My guess, though, is that you probably don't want to group alerts 
>>>>> before sending them to opsgenie. You haven't shown your full alertmanager 
>>>>> config, but if you have a line like
>>>>>
>>>>>    group_by: ['alertname']
>>>>>
>>>>> then try
>>>>>
>>>>>    group_by: ["..."]
>>>>>
>>>>> (literally, exactly that: a single string containing three dots, 
>>>>> inside square brackets)
>>>>>
>>>>> On Tuesday 2 April 2024 at 17:15:39 UTC+1 mohan garden wrote:
>>>>>
>>>>>> Dear Prometheus Community,
>>>>>> I am reaching out regarding an issue i have encountered with  
>>>>>> prometheus alert tagging, specifically while creating tickets in 
>>>>>> Opsgenie.
>>>>>>
>>>>>>
>>>>>> I have configured alertmanager  to send alerts to Opsgenie as , the 
>>>>>> configuration as :
>>>>>> [image: photo001.png]i ticket is generated with expected description 
>>>>>> and tags as - 
>>>>>> [image: photo002.png]
>>>>>>
>>>>>> Now, by default the alerts are grouped by the alert name( default 
>>>>>> behavior).So when the similar event happens on a different server i see 
>>>>>> that the description is updated as:
>>>>>> [image: photo003.png]
>>>>>> but the tag on the ticket remains same, 
>>>>>> expected behavior: criteria=..., host=108, host=114, 
>>>>>> infra.....support 
>>>>>>
>>>>>> I have set update_alert and send_resolved settings to true.
>>>>>> I am not sure that in order to make it work as expected, If i need 
>>>>>> additional configuration at opsgenie or at the alertmanager. 
>>>>>>
>>>>>> I would appreciate any insight or guidance on the method to resolve 
>>>>>> this issue and ensure that alerts for different servers are correctly 
>>>>>> tagged in Opsgenie.
>>>>>>
>>>>>> Thank you in advance.
>>>>>> Regards,
>>>>>> CP
>>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a0c7490f-a7b7-4f0d-bb26-4b81413551cbn%40googlegroups.com.

[prometheus-users] Re: Prometheus alert tagging issue - multiple servers

Reply via email to