I got this strange behavior where resolved alerts are sent alongside with
firing ones. So i have this rule
kube_pod_container_status_ready{namespace="default"} == 0. What happens:
when pod is down alert is sent and everything is fine, then pod is up and
it is resolved. But if pod will fail again in a short period an gets
recreated by deploy with different name then the alert will be fired
mentioning previous pod and new one. I also noticed that if you wait about
20 minutes after alert is resolved and kill a pod again there is only one
pod in the alert.
This is expected:
fist alert 12:24
Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30
seconds.
Prometheus Alert (Firing)
summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance 10.233.74.200:8080
job kube-state-metrics
pod test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid 85f61574-2559-4f1a-8a14-f08ee4e34b8a
second alert 12:27
Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30
seconds.
Prometheus Alert (Resolved)
summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance 10.233.74.200:8080
job kube-state-metrics
pod test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid 85f61574-2559-4f1a-8a14-f08ee4e34b8a
Then I kill the pod and this happens (its a single alert):
third alert: 12:32
Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30
seconds. 12:32
Prometheus Alert (Firing)
summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance 10.233.74.200:8080
job kube-state-metrics
pod test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid 85f61574-2559-4f1a-8a14-f08ee4e34b8a
Container ubuntu in pod test-ubuntu-5579c5f49c-sjlrk is not ready for 30
seconds.
summary Container is not ready for too long.
alertname KubeContainerNotReady
container ubuntu
endpoint http
instance 10.233.74.200:8080
job kube-state-metrics
pod test-ubuntu-5579c5f49c-sjlrk
prometheus prometheus/prometheus-kube-prometheus-prometheus
service prometheus-kube-state-metrics
severity warning
uid aba2b39c-a4b1-4d02-a532-4ca39ef8c0da
here's my config:
config:
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_interval: 30s
repeat_interval: 24h
group_wait: 30s
receiver: 'prometheus-msteams'
receivers:
- name: 'prometheus-msteams'
webhook_configs: #
https://prometheus.io/docs/alerting/configuration/#webhook_config
- send_resolved: true
url: "http://prometheus-msteams:2000/prometheus-msteams"
Now, I know I can just group them by pod or some other labels or even turn
off grouping, but I want to figure out what exactly happens here. Also i
cant figure out what will happen to alert that has no label by which i am
grouping. For example if i group by podname how will alerts without pod be
treated.
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/037e1cbe-078e-458b-b306-b3933c871013n%40googlegroups.com.