[prometheus-users] Resolved alerts are grouped into firing alerts

Ivan Fri, 13 Jan 2023 03:08:37 -0800

I got this strange behavior where resolved alerts are sent alongside with 
firing ones. So i have this rule 
kube_pod_container_status_ready{namespace="default"} == 0. What happens: 
when pod is down alert is sent and everything is fine, then pod is up and 
it is resolved. But if pod will fail again in a short period an gets 
recreated by deploy with different name then the alert will be fired 
mentioning previous pod and new one. I also noticed that if you wait about 
20 minutes after alert is resolved and kill a pod again there is only one 
pod in the alert.


This is expected:
fist alert 12:24
Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30 
seconds. 

Prometheus Alert (Firing)
      
summary     Container is not ready for too long.
alertname    KubeContainerNotReady
container     ubuntu
endpoint      http
instance      10.233.74.200:8080
job                kube-state-metrics
pod               test-ubuntu-5579c5f49c-rsb8v
prometheus prometheus/prometheus-kube-prometheus-prometheus
service          prometheus-kube-state-metrics
severity         warning
uid                 85f61574-2559-4f1a-8a14-f08ee4e34b8a
        
second alert 12:27

Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30 
seconds.

Prometheus Alert (Resolved)

summary        Container is not ready for too long.
alertname       KubeContainerNotReady
container        ubuntu
endpoint         http
instance         10.233.74.200:8080
job                   kube-state-metrics
pod                  test-ubuntu-5579c5f49c-rsb8v
prometheus   prometheus/prometheus-kube-prometheus-prometheus
service            prometheus-kube-state-metrics
severity           warning
uid                   85f61574-2559-4f1a-8a14-f08ee4e34b8a
        
Then I kill the pod and this happens (its a single alert):

third alert: 12:32

Container ubuntu in pod test-ubuntu-5579c5f49c-rsb8v is not ready for 30 
seconds. 12:32

Prometheus Alert (Firing)

summary        Container is not ready for too long.
alertname       KubeContainerNotReady
container        ubuntu
endpoint         http
instance         10.233.74.200:8080
job                   kube-state-metrics
pod                  test-ubuntu-5579c5f49c-rsb8v
prometheus   prometheus/prometheus-kube-prometheus-prometheus
service            prometheus-kube-state-metrics
severity           warning
uid                    85f61574-2559-4f1a-8a14-f08ee4e34b8a

Container ubuntu in pod test-ubuntu-5579c5f49c-sjlrk is not ready for 30 
seconds.

summary       Container is not ready for too long.
alertname      KubeContainerNotReady
container       ubuntu
endpoint        http
instance        10.233.74.200:8080
job                  kube-state-metrics
pod                 test-ubuntu-5579c5f49c-sjlrk
prometheus  prometheus/prometheus-kube-prometheus-prometheus
service           prometheus-kube-state-metrics
severity          warning
uid                  aba2b39c-a4b1-4d02-a532-4ca39ef8c0da

here's my config:

  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname']
      group_interval: 30s
      repeat_interval: 24h
      group_wait: 30s
      receiver: 'prometheus-msteams'
    receivers:
    - name: 'prometheus-msteams'
      webhook_configs: # 
https://prometheus.io/docs/alerting/configuration/#webhook_config 
      - send_resolved: true
        url: "http://prometheus-msteams:2000/prometheus-msteams";

Now, I know I can just group them by pod or some other labels or even turn 
off grouping, but I want to figure out what exactly happens here. Also i 
cant figure out what will happen to alert that has no label by which i am 
grouping. For example if i group by podname how will alerts without pod be 
treated. 

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/037e1cbe-078e-458b-b306-b3933c871013n%40googlegroups.com.

[prometheus-users] Resolved alerts are grouped into firing alerts

Reply via email to