Now would be a good time to do:

ls -l /var/lib/prometheus/data/chunks_head/
du -sck /var/lib/prometheus/data/chunks_head/*

My suspicion is your out-of-memory condition is messing up the writing of 
chunks.  Are you using cgroups/containers?

Also, is prometheus continually crashing and being restarted by systemd? 
Try looking in "journalctl -eu prometheus".  That might explain why you see 
lots of free memory most of the time (when prometheus is stopped).

On Thursday, 17 February 2022 at 14:57:25 UTC Senthil wrote:

> The issue started again. 
>
> 629G    chunks_head
> 0       lock
> 4.0K    queries.active
> 9.3G    wal
>
> There is numerous restart of Prometheus
> Feb 17 09:02:02 kernel: Out of memory: Kill process 36580 (prometheus) 
> score 844 or sacrifice child
> Feb 17 09:08:36 kernel: Out of memory: Kill process 39001 (prometheus) 
> score 846 or sacrifice child
> Feb 17 09:16:02 kernel: Out of memory: Kill process 41074 (prometheus) 
> score 845 or sacrifice child
> Feb 17 09:22:17 kernel: Out of memory: Kill process 44665 (prometheus) 
> score 844 or sacrifice child
> Feb 17 09:29:25 kernel: Out of memory: Kill process 47234 (prometheus) 
> score 844 or sacrifice child
> Feb 17 09:36:06 kernel: Out of memory: Kill process 48970 (prometheus) 
> score 846 or sacrifice child
> Feb 17 09:43:21 kernel: Out of memory: Kill process 50661 (prometheus) 
> score 844 or sacrifice child
>
> but there is plenty of mem available in the servers.
>
>               total        used        free      shared  buff/cache   
> available
> Mem:             47           5          31           0          10       
>    40
> Swap:             5           1           3
> Total:           52           7          35
>
> On Tuesday, February 1, 2022 at 5:21:32 PM UTC-5 Brian Candler wrote:
>
>> On Tuesday, 1 February 2022 at 21:52:30 UTC Senthil wrote:
>>
>>> I started on Jan 31, so it's a day.
>>>
>>> # du -sck chunks_head/*
>>> 54140   chunks_head/024326
>>> 4       chunks_head/024327
>>> 54144   total
>>>
>>
>> That's perfectly reasonable: it's only 54MB (which is a long way from 
>> 689GB!)
>>
>> Here's what I see on a moderately busy system:
>>
>> root@ldex-prometheus:~# du -sck /var/lib/prometheus/data/chunks_head/*
>> 81004        /var/lib/prometheus/data/chunks_head/006831
>> 77824        /var/lib/prometheus/data/chunks_head/006832
>> 158828        total
>>
>> That's comparable to yours.
>>
>> Therefore, I think you need to keep an eye on this periodically.  If only 
>> you had a monitoring system which could do this for you :-)
>>
>> If it does start to rise, that's when you'll need to check prometheus log 
>> output and find out what's happening.  But this is very strange, and it 
>> does seem to be something specific to your system.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/25405bc6-d4e6-4152-8dde-87b89e18bdd9n%40googlegroups.com.

Reply via email to