Re: Possible OOM risk after huge REINDEXCOLLECTION requests

Jan Høydahl Wed, 30 Apr 2025 05:29:33 -0700

We provide jattach for these purposes in the official docker image: 
https://solr.apache.org/guide/solr/latest/deployment-guide/solr-in-docker.html#debugging-with-jattach


Jan

> 30. apr. 2025 kl. 13:11 skrev Jason Gerlowski <gerlowsk...@gmail.com>:
> 
> Hi Florian,
> 
> I haven't heard any reports of a memory leak triggered by the
> REINDEXCOLLECTION codepath, but such a leak is always possible.  I'd
> love to see what you find if you're able to take a heap dump!
> 
> The typical way (afaik) to create heap dumps "on demand" is with jcmd
> or jmap.  If that's not available on the Solr pods, it might be
> possible to "kubectl cp" one of those executables into your pods, and
> then run it to produce a dump.  If that doesn't work, then the other
> option I'd recommend is making sure that Solr is running with the
> "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file-or-dir-path>"
> JVM flags so that a heap dump is produced when OOM is eventually hit.
> (You'd want to make sure that the heap-dump path is something
> persistent, so that the heap dump won't be wiped out by a
> pod-restart.)
> 
> Hopefully that helps a bit.  Let us know if you're able to find anything!
> 
> Best,
> 
> Jason
> 
> On Wed, Apr 30, 2025 at 6:55 AM Schieder Florian
> <florian.schie...@puzzleyou.de.invalid> wrote:
>> 
>> Hello all,
>> 
>> we are running Solr 9.8.1 on Kubernetes using the official Solr Operator 
>> 0.9.1.
>> 
>> After many asynchronous REINDEXCOLLECTION requests we are encountering a 
>> significant RAM allocation increase which is itself neither surprising nor 
>> problematic, but after those requests have successfully completed, the 
>> consumption stays constant and does not decrease. Manually restarting the 
>> SolrCloud deployment always resolves that issue, so for me it seems like 
>> either the allocated blocks are afterwards not being freed up correctly or, 
>> they *are* deallocated correctly but after the requests completed, the heap 
>> itself is fragmented too much.
>> 
>> If the heap allocation increased and increased after several 
>> REINDEXCOLLECTION requests and we did not restart the pods, this led to OOM 
>> crashes for us in the past. In this case, the REINDEXCOLLECTION async 
>> requests were interrupted which required manual intervention.
>> 
>> It's easy for me to reproduce this RAM allocation behavior, but after diving 
>> into how to create JVM memory dumps with 'jcmd' and 'jmap', it seems to me 
>> like it's not trivially possible to create such a heap dump with the current 
>> tools installed into the Solr pods. However, I'm not that expert regarding 
>> Solr and the Java ecosystem, so if there are any solutions to create memory 
>> dumps to help the Solr development team tracing the RAM consumption issues 
>> explained above, any ideas are welcome.
>> 
>> Many thanks!
>> Florian Schieder

Re: Possible OOM risk after huge REINDEXCOLLECTION requests

Reply via email to