Hi Florian, Glad you were able to get the heap dumps as Jan suggested!
I wouldn't sell yourself short - looking at heap dumps can be intimidating if you haven't done it before, but there's some really great analysis tools out there that summarize things in a pretty clear way. I'd open the heap dump in one of those (I personally recommend the "Eclipse Memory Analyzer Tool", sometimes called "MAT") and look at the biggest sections of the "Allocations" pie chart. The "Dominator Report" is another good place to look, if you see that in your tool of choice. You may recognize more of the Solr class and object names than you're giving yourself credit for, and if you don't you can always share them here and we can try and interpret. In general folks (myself included) are a little reluctant to click file-download links shared on the mailing list, mostly for security reasons. Open source projects are common targets of phishing attacks, and while every sign points to your sincerity and trustworthiness, it's hard to argue against prudence. Best, Jason On Mon, May 12, 2025 at 9:34 AM Schieder Florian <florian.schie...@puzzleyou.de.invalid> wrote: > > Good morning, > > please excuse the late response. The jattach hint was very useful, thank you > very much. > I have created some memory dumps before and after the REINDEXCOLLECTION jobs. > Honestly, I think it won't help much if I take a deep dive into understanding > the problem myself. I could share them via download with you if you want to. > > Florian > > On 2025/04/30 12:28:19 Jan Høydahl wrote: > > We provide jattach for these purposes in the official docker image: > > https://solr.apache.org/guide/solr/latest/deployment-guide/solr-in-docker.html#debugging-with-jattach > > > > Jan > > > > > 30. apr. 2025 kl. 13:11 skrev Jason Gerlowski <ge...@gmail.com>: > > > > > > Hi Florian, > > > > > > I haven't heard any reports of a memory leak triggered by the > > > REINDEXCOLLECTION codepath, but such a leak is always possible. I'd > > > love to see what you find if you're able to take a heap dump! > > > > > > The typical way (afaik) to create heap dumps "on demand" is with jcmd > > > or jmap. If that's not available on the Solr pods, it might be > > > possible to "kubectl cp" one of those executables into your pods, and > > > then run it to produce a dump. If that doesn't work, then the other > > > option I'd recommend is making sure that Solr is running with the > > > "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file-or-dir-path>" > > > JVM flags so that a heap dump is produced when OOM is eventually hit. > > > (You'd want to make sure that the heap-dump path is something > > > persistent, so that the heap dump won't be wiped out by a > > > pod-restart.) > > > > > > Hopefully that helps a bit. Let us know if you're able to find anything! > > > > > > Best, > > > > > > Jason > > > > > > On Wed, Apr 30, 2025 at 6:55 AM Schieder Florian > > > <fl...@puzzleyou.de.invalid> wrote: > > >> > > >> Hello all, > > >> > > >> we are running Solr 9.8.1 on Kubernetes using the official Solr Operator > > >> 0.9.1. > > >> > > >> After many asynchronous REINDEXCOLLECTION requests we are encountering a > > >> significant RAM allocation increase which is itself neither surprising > > >> nor problematic, but after those requests have successfully completed, > > >> the consumption stays constant and does not decrease. Manually > > >> restarting the SolrCloud deployment always resolves that issue, so for > > >> me it seems like either the allocated blocks are afterwards not being > > >> freed up correctly or, they *are* deallocated correctly but after the > > >> requests completed, the heap itself is fragmented too much. > > >> > > >> If the heap allocation increased and increased after several > > >> REINDEXCOLLECTION requests and we did not restart the pods, this led to > > >> OOM crashes for us in the past. In this case, the REINDEXCOLLECTION > > >> async requests were interrupted which required manual intervention. > > >> > > >> It's easy for me to reproduce this RAM allocation behavior, but after > > >> diving into how to create JVM memory dumps with 'jcmd' and 'jmap', it > > >> seems to me like it's not trivially possible to create such a heap dump > > >> with the current tools installed into the Solr pods. However, I'm not > > >> that expert regarding Solr and the Java ecosystem, so if there are any > > >> solutions to create memory dumps to help the Solr development team > > >> tracing the RAM consumption issues explained above, any ideas are > > >> welcome. > > >> > > >> Many thanks! > > >> Florian Schieder > > > >