Possible OOM risk after huge REINDEXCOLLECTION requests

Schieder Florian Wed, 30 Apr 2025 03:56:47 -0700

Hello all,

we are running Solr 9.8.1 on Kubernetes using the official Solr Operator 0.9.1.


After many asynchronous REINDEXCOLLECTION requests we are encountering a 
significant RAM allocation increase which is itself neither surprising nor 
problematic, but after those requests have successfully completed, the 
consumption stays constant and does not decrease. Manually restarting the 
SolrCloud deployment always resolves that issue, so for me it seems like either 
the allocated blocks are afterwards not being freed up correctly or, they *are* 
deallocated correctly but after the requests completed, the heap itself is 
fragmented too much.

If the heap allocation increased and increased after several REINDEXCOLLECTION 
requests and we did not restart the pods, this led to OOM crashes for us in the 
past. In this case, the REINDEXCOLLECTION async requests were interrupted which 
required manual intervention.

It's easy for me to reproduce this RAM allocation behavior, but after diving 
into how to create JVM memory dumps with 'jcmd' and 'jmap', it seems to me like 
it's not trivially possible to create such a heap dump with the current tools 
installed into the Solr pods. However, I'm not that expert regarding Solr and 
the Java ecosystem, so if there are any solutions to create memory dumps to 
help the Solr development team tracing the RAM consumption issues explained 
above, any ideas are welcome.

Many thanks!
Florian Schieder

Possible OOM risk after huge REINDEXCOLLECTION requests

Reply via email to