Hi Nishant, I’m also new to prometheus and faced similar scenario recently.
What helped me was to add a job to monitor prometheus instance itself, then import a prometheus 2.0 grafana dashboard and watch prometheus memory consumption and samples appended per second while defining new servicemonitors. This in the end helped me stabilise the memory usage as well as identify services that generated way too many metrics responsible for huge memory consumption. HTH, Martin > On 13 Apr 2020, at 11:03, Nishant Ketu <nishant.k...@atmecs.com> wrote: > > We have deployed Prometheus through helm and using after around 2 months we > get OOM error and the pods failed to restart. We have manually clean up the > /data to get the pod running again. I have used the retention flag but it > don't seem to work on wall folder of /data. Any help for this would be nice. > Thanks > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/C6306D48-0102-4647-B12C-EDD3EA49E251%40martinman.net.