On 5/3/2022 5:01 PM, Vincenzo D'Amore wrote:
I'm tuning a solrcloud 5.4.1 deployment (3 nodes, 12 cores each, 18GB ram)
that is experiencing frequent OutOfMemoryError (20 a day in total)
exceptions during the execution of a group query.

Looking at query group.limit=1 but the rows range between 1000 and 10000.
I'm analyzing the solr query, and I've added a few JVM parameters to dump
the active threads and the allocated memory to better analyze the OOM.
But I was curious to ask in your experience how I should be preoccupied by
the OOM(s).
In other words, I'm working to remove them ASAP, but when an OOM happens
the Solr behaviour is completely compromised or Solr returns seamlessly to
work normally?

As others have said, Java program state when OOME occurs is completely unpredictable.  For Solr, anything could happen, including index corruption.

This is why when Solr is started via the bin/solr shell script, it is started with a java parameter that will cause it to commit suicide whenever OOME occurs.  This functionality has not yet been implemented on Windows.  Starting in 9.0, because the minimum Java version will be 11, I think we can alter the way that works so equivalent functionality will exist on Windows.

Solr does NOT come with anything that will restart after OOME ... because chances are that if you encounter OOME once, it will continue to happen until you fix the problem.  Anything that anyone has which restarts Solr automatically is something they implemented -- Solr will not do this out of the box.  I don't recommend implementing anything like that.  Solr normally does NOT crash.  If it does crash, there is usually something VERY wrong that needs to be fixed.

There are precisely two ways to deal with OOME.  One is to increase the available amount of the resource that has been depleted, which might not actually be memory.  The other is to change things so less of that resource is required -- reduce the index size, modify queries, etc.

Thanks,
Shawn

Reply via email to