[prometheus-users] How to optimize prometheus memory for best perfromance and prevent pods from crashes due to memory.

Ravi Teja Reddy Sun, 12 Apr 2020 21:48:26 -0700

HI All,

I am new to prometheus, we started using prometheus for monitoring our kops 
based kubernetes cluster, most of the time we run of of the memory and we 
started increasing the memory limits from the pods which is not the right 
option. We started with the memeory limits as 200Mi and right we we ended 
with almost 6000Mi which is around 6GB is too much. I tried adding few 
flags and it did not work any helm in making sure that my prometheus does 
not go out of memory and crash.



My Prometheus Server Version 2.4.2
Deployed in Kops Kubernetes Cluster 

*Current configuration:*

Name:         prometheus-k8s-1

Namespace:    monitoring

Priority:     0

Node:         ip-172-31-98-1.us-west-2.compute.internal/172.31.98.1

Start Time:   Mon, 13 Apr 2020 09:35:05 +0530

Labels:       app=prometheus

              controller-revision-hash=prometheus-k8s-dcbcc8f48

              prometheus=k8s

              statefulset.kubernetes.io/pod-name=prometheus-k8s-1

Annotations:  cni.projectcalico.org/podIP: 100.119.250.205/32

Status:       Running

IP:           100.119.250.205

IPs:

  IP:           100.119.250.205

Controlled By:  StatefulSet/prometheus-k8s

Containers:

  prometheus:

    Container ID:  
docker://d5d29d96371596f99b04c9b8c673577612ba8ee0e55c7148efd9be9fbec88fca

    Image:         quay.io/prometheus/prometheus:v2.4.2

    Image ID:      
docker-pullable://quay.io/prometheus/prometheus@sha256:8e4d8817b1eb40d793f7207fd064ef2a3d47e3dd6290738ca3c6d642489cea93

    Port:          9090/TCP

    Host Port:     0/TCP

    Args:

      --config.file=/etc/prometheus/config_out/prometheus.env.yaml

      --storage.tsdb.path=/prometheus

      --storage.tsdb.retention=90d

      --web.enable-lifecycle

      --storage.tsdb.no-lockfile

      --web.external-url=http://prometheus:9090

      --web.route-prefix=/

    State:          Running

      Started:      Mon, 13 Apr 2020 10:09:35 +0530

    Last State:     Terminated

      Reason:       OOMKilled

      Exit Code:    137

      Started:      Mon, 13 Apr 2020 10:02:33 +0530

      Finished:     Mon, 13 Apr 2020 10:08:07 +0530

    Ready:          False

    Restart Count:  5

    Limits:

      cpu:     300m

      memory:  6000Mi

    Requests:

      cpu:        200m

      memory:     5700Mi

    Liveness:     http-get http://:web/-/healthy delay=0s timeout=3s 
period=5s #success=1 #failure=6

    Readiness:    http-get http://:web/-/ready delay=0s timeout=3s 
period=5s #success=1 #failure=120

    Environment:  <none>

    Mounts:

      /etc/prometheus/config_out from config-out (ro)

      /prometheus from prometheus-k8s-db (rw,path="prometheus-db")

      /var/run/secrets/kubernetes.io/serviceaccount from 
prometheus-k8s-token-44tjh (ro)

  prometheus-config-reloader:

    Container ID:  
docker://1dea8054b610bc82411f84454fba3b83089fc85b138a1b5ab49a57571daff822

    Image:         quay.io/coreos/prometheus-config-reloader:v0.0.4

    Image ID:      
docker-pullable://quay.io/coreos/prometheus-config-reloader@sha256:b15f35af5c3e4bd75c7e74bd27b862f1c119fc51080a838e5b3399a134c862e5

    Port:          <none>

    Host Port:     <none>

    Args:

      --reload-url=http://localhost:9090/-/reload

      --config-file=/etc/prometheus/config/prometheus.yaml

      --rule-list-file=/etc/prometheus/config/configmaps.json

      --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml

      --rule-dir=/etc/prometheus/config_out/rules

    State:          Running

      Started:      Mon, 13 Apr 2020 09:35:31 +0530

    Ready:          True

    Restart Count:  0

    Limits:

      cpu:     10m

      memory:  50Mi

    Requests:

      cpu:     10m

      memory:  50Mi

    Environment:

      POD_NAME:  prometheus-k8s-1 (v1:metadata.name)

    Mounts:

      /etc/prometheus/config from config (rw)

      /etc/prometheus/config_out from config-out (rw)

      /var/run/secrets/kubernetes.io/serviceaccount from 
prometheus-k8s-token-44tjh (ro)

Conditions:

  Type              Status

  Initialized       True 

  Ready             False 

  ContainersReady   False 

  PodScheduled      True 

Volumes:

  prometheus-k8s-db:

    Type:       PersistentVolumeClaim (a reference to a 
PersistentVolumeClaim in the same namespace)

    ClaimName:  prometheus-k8s-db-prometheus-k8s-1

    ReadOnly:   false

  config:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  prometheus-k8s

    Optional:    false

  config-out:

    Type:       EmptyDir (a temporary directory that shares a pod's 
lifetime)

    Medium:     

    SizeLimit:  <unset>

  prometheus-k8s-token-44tjh:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  prometheus-k8s-token-44tjh

    Optional:    false

QoS Class:       Burstable

Node-Selectors:  <none>

Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s

                 node.kubernetes.io/unreachable:NoExecute for 300s

Events:

  Type     Reason                  Age                    From              
                                  Message

  ----     ------                  ----                   ----              
                                  -------

  Normal   Scheduled               <unknown>              default-scheduler   
                                Successfully assigned 
monitoring/prometheus-k8s-1 to ip-172-31-98-1.us-west-2.compute.internal

  Normal   SuccessfulAttachVolume  39m                    
attachdetach-controller 
                            AttachVolume.Attach succeeded for volume 
"pvc-ef266625-ca5e-11e9-8b8f-069b6823c94e"

  Normal   Pulled                  39m                    kubelet, 
ip-172-31-98-1.us-west-2.compute.internal  Container image 
"quay.io/prometheus/prometheus:v2.4.2" already present on machine

  Normal   Created                 39m                    kubelet, 
ip-172-31-98-1.us-west-2.compute.internal  Created container prometheus

  Normal   Started                 39m                    kubelet, 
ip-172-31-98-1.us-west-2.compute.internal  Started container prometheus

  Normal   Pulled                  39m                    kubelet, 
ip-172-31-98-1.us-west-2.compute.internal  Container image 
"quay.io/coreos/prometheus-config-reloader:v0.0.4" already present on 
machine

  Normal   Created                 39m                    kubelet, 
ip-172-31-98-1.us-west-2.compute.internal  Created container 
prometheus-config-reloader

  Normal   Started                 39m                    kubelet, 
ip-172-31-98-1.us-west-2.compute.internal  Started container 
prometheus-config-reloader

  Warning  Unhealthy               4m39s (x380 over 39m)  kubelet, 
ip-172-31-98-1.us-west-2.compute.internal  Readiness probe failed: HTTP 
probe failed with statuscode: 503

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d994d091-7cb3-403f-ac0d-60b7932e717d%40googlegroups.com.

[prometheus-users] How to optimize prometheus memory for best perfromance and prevent pods from crashes due to memory.

Reply via email to