Re: [prometheus-users] OOM error for Prometheus

Martin Man Mon, 13 Apr 2020 02:53:45 -0700

Hi Nishant,

I’m also new to prometheus and faced similar scenario recently.

What helped me was to add a job to monitor prometheus instance itself, then 
import a prometheus 2.0 grafana dashboard and watch prometheus memory 
consumption and samples appended per second while defining new servicemonitors. 
This in the end helped me stabilise the memory usage as well as identify 
services that generated way too many metrics responsible for huge memory 
consumption.

HTH,
Martin

> On 13 Apr 2020, at 11:03, Nishant Ketu <nishant.k...@atmecs.com> wrote:
> 
> We have deployed Prometheus through helm and using after around 2 months we 
> get OOM error and the pods failed to restart. We have manually clean up the 
> /data to get the pod running again. I have used the retention flag but it 
> don't seem to work on wall folder of /data. Any help for this would be nice. 
> Thanks
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/C6306D48-0102-4647-B12C-EDD3EA49E251%40martinman.net.

Re: [prometheus-users] OOM error for Prometheus

Reply via email to