Now would be a good time to do: ls -l /var/lib/prometheus/data/chunks_head/ du -sck /var/lib/prometheus/data/chunks_head/*
My suspicion is your out-of-memory condition is messing up the writing of chunks. Are you using cgroups/containers? Also, is prometheus continually crashing and being restarted by systemd? Try looking in "journalctl -eu prometheus". That might explain why you see lots of free memory most of the time (when prometheus is stopped). On Thursday, 17 February 2022 at 14:57:25 UTC Senthil wrote: > The issue started again. > > 629G chunks_head > 0 lock > 4.0K queries.active > 9.3G wal > > There is numerous restart of Prometheus > Feb 17 09:02:02 kernel: Out of memory: Kill process 36580 (prometheus) > score 844 or sacrifice child > Feb 17 09:08:36 kernel: Out of memory: Kill process 39001 (prometheus) > score 846 or sacrifice child > Feb 17 09:16:02 kernel: Out of memory: Kill process 41074 (prometheus) > score 845 or sacrifice child > Feb 17 09:22:17 kernel: Out of memory: Kill process 44665 (prometheus) > score 844 or sacrifice child > Feb 17 09:29:25 kernel: Out of memory: Kill process 47234 (prometheus) > score 844 or sacrifice child > Feb 17 09:36:06 kernel: Out of memory: Kill process 48970 (prometheus) > score 846 or sacrifice child > Feb 17 09:43:21 kernel: Out of memory: Kill process 50661 (prometheus) > score 844 or sacrifice child > > but there is plenty of mem available in the servers. > > total used free shared buff/cache > available > Mem: 47 5 31 0 10 > 40 > Swap: 5 1 3 > Total: 52 7 35 > > On Tuesday, February 1, 2022 at 5:21:32 PM UTC-5 Brian Candler wrote: > >> On Tuesday, 1 February 2022 at 21:52:30 UTC Senthil wrote: >> >>> I started on Jan 31, so it's a day. >>> >>> # du -sck chunks_head/* >>> 54140 chunks_head/024326 >>> 4 chunks_head/024327 >>> 54144 total >>> >> >> That's perfectly reasonable: it's only 54MB (which is a long way from >> 689GB!) >> >> Here's what I see on a moderately busy system: >> >> root@ldex-prometheus:~# du -sck /var/lib/prometheus/data/chunks_head/* >> 81004 /var/lib/prometheus/data/chunks_head/006831 >> 77824 /var/lib/prometheus/data/chunks_head/006832 >> 158828 total >> >> That's comparable to yours. >> >> Therefore, I think you need to keep an eye on this periodically. If only >> you had a monitoring system which could do this for you :-) >> >> If it does start to rise, that's when you'll need to check prometheus log >> output and find out what's happening. But this is very strange, and it >> does seem to be something specific to your system. >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/25405bc6-d4e6-4152-8dde-87b89e18bdd9n%40googlegroups.com.

