Good morning, please excuse the late response. The jattach hint was very useful, thank you very much. I have created some memory dumps before and after the REINDEXCOLLECTION jobs. Honestly, I think it won't help much if I take a deep dive into understanding the problem myself. I could share them via download with you if you want to.
Florian On 2025/04/30 12:28:19 Jan Høydahl wrote: > We provide jattach for these purposes in the official docker image: > https://solr.apache.org/guide/solr/latest/deployment-guide/solr-in-docker.html#debugging-with-jattach > > Jan > > > 30. apr. 2025 kl. 13:11 skrev Jason Gerlowski <ge...@gmail.com>: > > > > Hi Florian, > > > > I haven't heard any reports of a memory leak triggered by the > > REINDEXCOLLECTION codepath, but such a leak is always possible. I'd > > love to see what you find if you're able to take a heap dump! > > > > The typical way (afaik) to create heap dumps "on demand" is with jcmd > > or jmap. If that's not available on the Solr pods, it might be > > possible to "kubectl cp" one of those executables into your pods, and > > then run it to produce a dump. If that doesn't work, then the other > > option I'd recommend is making sure that Solr is running with the > > "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file-or-dir-path>" > > JVM flags so that a heap dump is produced when OOM is eventually hit. > > (You'd want to make sure that the heap-dump path is something > > persistent, so that the heap dump won't be wiped out by a > > pod-restart.) > > > > Hopefully that helps a bit. Let us know if you're able to find anything! > > > > Best, > > > > Jason > > > > On Wed, Apr 30, 2025 at 6:55 AM Schieder Florian > > <fl...@puzzleyou.de.invalid> wrote: > >> > >> Hello all, > >> > >> we are running Solr 9.8.1 on Kubernetes using the official Solr Operator > >> 0.9.1. > >> > >> After many asynchronous REINDEXCOLLECTION requests we are encountering a > >> significant RAM allocation increase which is itself neither surprising nor > >> problematic, but after those requests have successfully completed, the > >> consumption stays constant and does not decrease. Manually restarting the > >> SolrCloud deployment always resolves that issue, so for me it seems like > >> either the allocated blocks are afterwards not being freed up correctly > >> or, they *are* deallocated correctly but after the requests completed, the > >> heap itself is fragmented too much. > >> > >> If the heap allocation increased and increased after several > >> REINDEXCOLLECTION requests and we did not restart the pods, this led to > >> OOM crashes for us in the past. In this case, the REINDEXCOLLECTION async > >> requests were interrupted which required manual intervention. > >> > >> It's easy for me to reproduce this RAM allocation behavior, but after > >> diving into how to create JVM memory dumps with 'jcmd' and 'jmap', it > >> seems to me like it's not trivially possible to create such a heap dump > >> with the current tools installed into the Solr pods. However, I'm not that > >> expert regarding Solr and the Java ecosystem, so if there are any > >> solutions to create memory dumps to help the Solr development team tracing > >> the RAM consumption issues explained above, any ideas are welcome. > >> > >> Many thanks! > >> Florian Schieder > >