alertmanager: 0.21.0
prometheus: 2.30.3
I am trying to get my head around some unexpected alertmanager behaviour.
I am alerting on the following metrics:
client_disconnect{appenv="testbed",conn="2",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="3",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="4",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="5",compid="CLIENT-A"} 0
and have the rule below defined:
- alert: Client Disconnect
expr: client_disconnect == 1
for: 2s
labels:
severity: critical
notification: slack
annotations:
summary: "Appenv {{ $labels.appenv }} on connection {{ $labels.conn
}} compid {{ $labels.compid }} down"
description: "{{ $labels.instance }} disconnect: {{ $labels.appenv
}} on connection {{ $labels.conn }} compid {{ $labels.compid }}"
My alertmanager config is as below:
global:
slack_api_url: 'https://hooks.slack.com/services/REDACTED'
route:
group_wait: 5s
group_interval: 5s
group_by: ['section','env']
repeat_interval: 10m
receiver: 'default_receiver'
routes:
- match:
notification: slack
receiver: slack_receiver
group_by: ['appenv','compid']
receivers:
- name: 'slack_receiver'
slack_configs:
- channel: 'monitoring'
send_resolved: true
title: '{{ template "custom_title" . }}'
text: '{{ template "custom_slack_message" . }}'
- name: 'default_receiver'
webhook_configs:
- url: http://pi4-1.home:5000
send_resolved: true
templates:
- /etc/alertmanager/notifications.tmpl
My custom template results in a message as formatted below being display in
Slack:
[image: slack1.PNG]
as expected this repeats every 10 mins.
If one of these client_disconnects subsequently resolves, such that the
metric now looks like this:
client_disconnect{appenv="testbed",conn="2",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="3",compid="CLIENT-A"} 1
client_disconnect{appenv="testbed",conn="4",compid="CLIENT-A"} 0
client_disconnect{appenv="testbed",conn="5",compid="CLIENT-A"} 0
Then I receive the following messages:
[image: slack2.PNG]
When the repeat interval comes round (10 mins later) I receive the
following messages:
[image: slack3.PNG]
The second firing line comes in at 22:02 and the third firing line at 22:03
(sorry the timestamps only show through a hover over in Slack).
I can't understand this behaviour. I am running single unclustered
instances of prometheus and alertmanager.
Is anyone in a position to explain this behaviour to me. I get a very
similar situation if I simply use the webhook instead of slack.
The subsequent repeat (after the last message) shows the current state:
[image: slack4.PNG]
Many thanks.
For reference, my slack templates are below:
{{ define "__single_message_title" }}{{ range .Alerts.Firing }}{{
.Labels.alertname }} on {{ .Annotations.identifier }}{{ end }}{{ range
.Alerts.Resolved }}{{ .Labels.alertname }} on {{ .Annotations.identifier
}}{{ end }}{{ end }}
{{ define "custom_title" }}[{{ .Status | toUpper }}{{ if eq .Status
"firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ if or (and (eq (len
.Alerts.Firing) 1) (eq (len .Alerts.Resolved) 0)) (and (eq (len
.Alerts.Firing) 0) (eq (len .Alerts.Resolved) 1)) }}{{ template
"__single_message_title" . }}{{ end }}{{ end }}
{{ define "custom_slack_message" }}
{{ if or (and (eq (len .Alerts.Firing) 1) (eq (len .Alerts.Resolved) 0))
(and (eq (len .Alerts.Firing) 0) (eq (len .Alerts.Resolved) 1)) }}
{{ range .Alerts.Firing }}{{ .Annotations.description }}{{ end }}{{ range
.Alerts.Resolved }}{{ .Annotations.description }}{{ end }}
{{ else }}
{{ if gt (len .Alerts.Firing) 0 }}
*Alerts Firing:*
Client disconnect: {{ .CommonLabels.appenv }} for {{ .CommonLabels.compid
}}. Connections: {{ range .Alerts.Firing }}{{ .Labels.conn }} {{ end }}have
failed.
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
*Alerts Resolved:*
Client disconnect: {{ .CommonLabels.appenv }} for {{ .CommonLabels.compid
}}. Connections: {{ range .Alerts.Resolved }}{{ .Labels.conn }} {{ end
}}have failed.
{{ end }}
{{ end }}
{{ end }}
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/34e68aff-831f-4ac0-b278-250bec1987a2n%40googlegroups.com.