Re: [prometheus-users] OOM error for Prometheus

2020-07-16 Thread Adso Castro
@Martin Just a ping about this issue, how did you identify what services were causing you trouble with too much metrics? I'm asking because I'm facing a similar problem at the moment. Thank you. Em segunda-feira, 13 de abril de 2020 às 06:53:13 UTC-3, Martin Man escreveu: > Hi Nishant, > >

Re: [prometheus-users] OOM error for Prometheus

2020-07-16 Thread Adso Castro
@Martin Just a ping about this issue, how did you identified what services were causing you trouble with too much metrics? I'm asking because I'm facing a similar problem at the moment. Thank you. Em segunda-feira, 13 de abril de 2020 às 06:53:13 UTC-3, Martin Man escreveu: > Hi Nishant, > >

Re: [prometheus-users] OOM error for Prometheus

2020-04-13 Thread Martin Man
Hi Nishant, I’m also new to prometheus and faced similar scenario recently. What helped me was to add a job to monitor prometheus instance itself, then import a prometheus 2.0 grafana dashboard and watch prometheus memory consumption and samples appended per second while defining new servicemon

Re: [prometheus-users] OOM error for Prometheus

2020-04-13 Thread Julius Volz
Hi, the WAL will always need to contain all data from the last few hours (see https://www.robustperception.io/how-much-space-does-the-wal-take-up for more about WAL space usage), so indeed setting a shorter retention time will not affect it, but you can set a flag to enable WAL compression (saving

[prometheus-users] OOM error for Prometheus

2020-04-13 Thread Nishant Ketu
We have deployed Prometheus through helm and using after around 2 months we get OOM error and the pods failed to restart. We have manually clean up the /data to get the pod running again. I have used the retention flag but it don't seem to work on wall folder of /data. Any help for this would be