[prometheus-users] Re: chunks_head space issue

Senthil Fri, 18 Feb 2022 06:42:15 -0800

Yes, it continuously crashes for OOM, 10 to 15 min once. 
Unfortunately, this time also someone deleted those files, to recover the 
prometheus, still it crashes and auto restarts.


Feb 18 05:49:18 kernel: Out of memory: Kill process 61845 (prometheus) 
score 844 or sacrifice child
Feb 18 05:52:26 kernel: Out of memory: Kill process 63185 (prometheus) 
score 844 or sacrifice child
Feb 18 05:55:47 kernel: Out of memory: Kill process 64500 (prometheus) 
score 844 or sacrifice child
Feb 18 05:58:51 kernel: Out of memory: Kill process 875 (prometheus) score 
844 or sacrifice child
Feb 18 05:58:51 kernel: Out of memory: Kill process 1754 (prometheus) score 
844 or sacrifice child
Feb 18 06:02:05 kernel: Out of memory: Kill process 2328 (prometheus) score 
845 or sacrifice child
Feb 18 06:05:39 kernel: Out of memory: Kill process 3155 (prometheus) score 
844 or sacrifice child
Feb 18 06:09:06 kernel: Out of memory: Kill process 5273 (prometheus) score 
845 or sacrifice child
Feb 18 06:12:24 kernel: Out of memory: Kill process 6549 (prometheus) score 
844 or sacrifice child
Feb 18 06:15:29 kernel: Out of memory: Kill process 6756 (prometheus) score 
845 or sacrifice child
Feb 18 06:18:28 kernel: Out of memory: Kill process 8474 (prometheus) score 
844 or sacrifice child
Feb 18 06:21:36 kernel: Out of memory: Kill process 8649 (prometheus) score 
845 or sacrifice child
Feb 18 06:24:41 kernel: Out of memory: Kill process 9708 (prometheus) score 
844 or sacrifice child
Feb 18 06:27:52 kernel: Out of memory: Kill process 11003 (prometheus) 
score 844 or sacrifice child
Feb 18 06:30:50 kernel: Out of memory: Kill process 11189 (prometheus) 
score 844 or sacrifice child
Feb 18 06:33:47 kernel: Out of memory: Kill process 12210 (prometheus) 
score 844 or sacrifice child

On Thursday, February 17, 2022 at 5:09:54 PM UTC-5 Brian Candler wrote:

> Now would be a good time to do:
>
> ls -l /var/lib/prometheus/data/chunks_head/
> du -sck /var/lib/prometheus/data/chunks_head/*
>
> My suspicion is your out-of-memory condition is messing up the writing of 
> chunks.  Are you using cgroups/containers?
>
> Also, is prometheus continually crashing and being restarted by systemd? 
> Try looking in "journalctl -eu prometheus".  That might explain why you see 
> lots of free memory most of the time (when prometheus is stopped).
>
> On Thursday, 17 February 2022 at 14:57:25 UTC Senthil wrote:
>
>> The issue started again. 
>>
>> 629G    chunks_head
>> 0       lock
>> 4.0K    queries.active
>> 9.3G    wal
>>
>> There is numerous restart of Prometheus
>> Feb 17 09:02:02 kernel: Out of memory: Kill process 36580 (prometheus) 
>> score 844 or sacrifice child
>> Feb 17 09:08:36 kernel: Out of memory: Kill process 39001 (prometheus) 
>> score 846 or sacrifice child
>> Feb 17 09:16:02 kernel: Out of memory: Kill process 41074 (prometheus) 
>> score 845 or sacrifice child
>> Feb 17 09:22:17 kernel: Out of memory: Kill process 44665 (prometheus) 
>> score 844 or sacrifice child
>> Feb 17 09:29:25 kernel: Out of memory: Kill process 47234 (prometheus) 
>> score 844 or sacrifice child
>> Feb 17 09:36:06 kernel: Out of memory: Kill process 48970 (prometheus) 
>> score 846 or sacrifice child
>> Feb 17 09:43:21 kernel: Out of memory: Kill process 50661 (prometheus) 
>> score 844 or sacrifice child
>>
>> but there is plenty of mem available in the servers.
>>
>>               total        used        free      shared  buff/cache   
>> available
>> Mem:             47           5          31           0          10       
>>    40
>> Swap:             5           1           3
>> Total:           52           7          35
>>
>> On Tuesday, February 1, 2022 at 5:21:32 PM UTC-5 Brian Candler wrote:
>>
>>> On Tuesday, 1 February 2022 at 21:52:30 UTC Senthil wrote:
>>>
>>>> I started on Jan 31, so it's a day.
>>>>
>>>> # du -sck chunks_head/*
>>>> 54140   chunks_head/024326
>>>> 4       chunks_head/024327
>>>> 54144   total
>>>>
>>>
>>> That's perfectly reasonable: it's only 54MB (which is a long way from 
>>> 689GB!)
>>>
>>> Here's what I see on a moderately busy system:
>>>
>>> root@ldex-prometheus:~# du -sck /var/lib/prometheus/data/chunks_head/*
>>> 81004        /var/lib/prometheus/data/chunks_head/006831
>>> 77824        /var/lib/prometheus/data/chunks_head/006832
>>> 158828        total
>>>
>>> That's comparable to yours.
>>>
>>> Therefore, I think you need to keep an eye on this periodically.  If 
>>> only you had a monitoring system which could do this for you :-)
>>>
>>> If it does start to rise, that's when you'll need to check prometheus 
>>> log output and find out what's happening.  But this is very strange, and it 
>>> does seem to be something specific to your system.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0ea90196-5fca-46f5-ae06-373c42abb410n%40googlegroups.com.

[prometheus-users] Re: chunks_head space issue

Reply via email to