Thanks! Some good ideas here. Yes, the OOM killer is from the Linux system (CentOS Linux 7 (Core)) and it is not the Solr that kills itself.
Not all OOMs are resulting in the restart though. Also, to reproduce the issue in non-prod, we are using the same prod queries and making it even worse by overloading the system almost 4-5 times but still not being able to reproduce it. I am quite stumped on that one. We have increased the heap size a bit since then and it definitely has reduced the frequency of restarts so we will be throwing more resources at the problem (heap, memory etc.). But it just feels like we are not getting to the root of the problem and it might return again at some point. On Tue, Nov 22, 2022 at 8:04 AM matthew sporleder <msporle...@gmail.com> wrote: > On Mon, Nov 21, 2022 at 7:02 PM Shawn Heisey <apa...@elyograg.org> wrote: > > > > On 11/21/22 15:01, gnandre wrote: > > > I am using Solr 8.5.2 in legacy mode (non-cloud). > > > > > > Some of the Solr nodes are automatically getting restarted after a few > > > days. There is no clear pattern to the rebooting time. Also, no > pattern in > > > number of incoming queries or nature of those queries. No > > > particular pattern in errors found in Solr logs. > > > > > > I am going to turn on the debug logs to see what is happening in Solr > when > > > it goes down. I am not able to reproduce the issue in one of our > non-prod > > > performance testing environments.I am recreating the same traffic as > prod > > > using access logs. > > > > > > Any other ideas about how I should go about debugging or reproducing > this > > > issue? TIA. > > > > As shipped, if Solr dies, it will NOT restart automatically. So that > > must have been something you added. > > > > What OS do you have it running on? > > > > If everything is sized correctly, Solr will never crash. Java programs > > are VERY stable if written correctly and run with plenty of system > > resources. > > > > On non-windows systems, Solr starts with an option that will cause it to > > commit suicide if Java's OutOfMemoryError exception is thrown. There > > are several resource depletions that can cause OOME, and some of them > > are NOT related to memory. This capability will not exist on Windows > > until Solr 9.2.0 is released. > > > > https://issues.apache.org/jira/browse/SOLR-8803 > > > > Most operating systems have a process that is called an "out of memory > > killer" ... if available memory gets too low, this will find a program > > on the system that is using a lot of memory and terminate it. On most > > installs, the process using the most memory will be Solr. > > > > I strongly recommend NOT restarting Solr automatically if it ever dies. > > Chances are that the reason it died is because the system needs some > > attention, and restarting it is simply going to result in it dying > > again, over and over. > > > > Thanks, > > Shawn > > > > I strongly agree here -- check your system logs for OOM and restarts > from the operating system + systemd (or whatever). > On a lark -- also check the uptime of the entire box. >