On 5/3/2022 5:01 PM, Vincenzo D'Amore wrote:
I'm tuning a solrcloud 5.4.1 deployment (3 nodes, 12 cores each, 18GB ram)
that is experiencing frequent OutOfMemoryError (20 a day in total)
exceptions during the execution of a group query.
Looking at query group.limit=1 but the rows range between 1000 and 10000.
I'm analyzing the solr query, and I've added a few JVM parameters to dump
the active threads and the allocated memory to better analyze the OOM.
But I was curious to ask in your experience how I should be preoccupied by
the OOM(s).
In other words, I'm working to remove them ASAP, but when an OOM happens
the Solr behaviour is completely compromised or Solr returns seamlessly to
work normally?
As others have said, Java program state when OOME occurs is completely
unpredictable. For Solr, anything could happen, including index corruption.
This is why when Solr is started via the bin/solr shell script, it is
started with a java parameter that will cause it to commit suicide
whenever OOME occurs. This functionality has not yet been implemented
on Windows. Starting in 9.0, because the minimum Java version will be
11, I think we can alter the way that works so equivalent functionality
will exist on Windows.
Solr does NOT come with anything that will restart after OOME ...
because chances are that if you encounter OOME once, it will continue to
happen until you fix the problem. Anything that anyone has which
restarts Solr automatically is something they implemented -- Solr will
not do this out of the box. I don't recommend implementing anything
like that. Solr normally does NOT crash. If it does crash, there is
usually something VERY wrong that needs to be fixed.
There are precisely two ways to deal with OOME. One is to increase the
available amount of the resource that has been depleted, which might not
actually be memory. The other is to change things so less of that
resource is required -- reduce the index size, modify queries, etc.
Thanks,
Shawn