RE: Re: Possible OOM risk after huge REINDEXCOLLECTION requests

Schieder Florian Mon, 12 May 2025 06:34:42 -0700

Good morning,

please excuse the late response. The jattach hint was very useful, thank you 
very much.
I have created some memory dumps before and after the REINDEXCOLLECTION jobs. 
Honestly, I think it won't help much if I take a deep dive into understanding 
the problem myself. I could share them via download with you if you want to.


Florian

On 2025/04/30 12:28:19 Jan Høydahl wrote:
> We provide jattach for these purposes in the official docker image: 
> https://solr.apache.org/guide/solr/latest/deployment-guide/solr-in-docker.html#debugging-with-jattach
>
> Jan
>
> > 30. apr. 2025 kl. 13:11 skrev Jason Gerlowski <ge...@gmail.com>:
> >
> > Hi Florian,
> >
> > I haven't heard any reports of a memory leak triggered by the
> > REINDEXCOLLECTION codepath, but such a leak is always possible. I'd
> > love to see what you find if you're able to take a heap dump!
> >
> > The typical way (afaik) to create heap dumps "on demand" is with jcmd
> > or jmap. If that's not available on the Solr pods, it might be
> > possible to "kubectl cp" one of those executables into your pods, and
> > then run it to produce a dump. If that doesn't work, then the other
> > option I'd recommend is making sure that Solr is running with the
> > "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file-or-dir-path>"
> > JVM flags so that a heap dump is produced when OOM is eventually hit.
> > (You'd want to make sure that the heap-dump path is something
> > persistent, so that the heap dump won't be wiped out by a
> > pod-restart.)
> >
> > Hopefully that helps a bit. Let us know if you're able to find anything!
> >
> > Best,
> >
> > Jason
> >
> > On Wed, Apr 30, 2025 at 6:55 AM Schieder Florian
> > <fl...@puzzleyou.de.invalid> wrote:
> >>
> >> Hello all,
> >>
> >> we are running Solr 9.8.1 on Kubernetes using the official Solr Operator 
> >> 0.9.1.
> >>
> >> After many asynchronous REINDEXCOLLECTION requests we are encountering a 
> >> significant RAM allocation increase which is itself neither surprising nor 
> >> problematic, but after those requests have successfully completed, the 
> >> consumption stays constant and does not decrease. Manually restarting the 
> >> SolrCloud deployment always resolves that issue, so for me it seems like 
> >> either the allocated blocks are afterwards not being freed up correctly 
> >> or, they *are* deallocated correctly but after the requests completed, the 
> >> heap itself is fragmented too much.
> >>
> >> If the heap allocation increased and increased after several 
> >> REINDEXCOLLECTION requests and we did not restart the pods, this led to 
> >> OOM crashes for us in the past. In this case, the REINDEXCOLLECTION async 
> >> requests were interrupted which required manual intervention.
> >>
> >> It's easy for me to reproduce this RAM allocation behavior, but after 
> >> diving into how to create JVM memory dumps with 'jcmd' and 'jmap', it 
> >> seems to me like it's not trivially possible to create such a heap dump 
> >> with the current tools installed into the Solr pods. However, I'm not that 
> >> expert regarding Solr and the Java ecosystem, so if there are any 
> >> solutions to create memory dumps to help the Solr development team tracing 
> >> the RAM consumption issues explained above, any ideas are welcome.
> >>
> >> Many thanks!
> >> Florian Schieder
>
>

RE: Re: Possible OOM risk after huge REINDEXCOLLECTION requests

Reply via email to