On Mon, Nov 21, 2022 at 7:02 PM Shawn Heisey <apa...@elyograg.org> wrote:
>
> On 11/21/22 15:01, gnandre wrote:
> > I am using Solr 8.5.2 in legacy mode (non-cloud).
> >
> > Some of the Solr nodes are automatically getting restarted after a few
> > days. There is no clear pattern to the rebooting time. Also, no pattern in
> > number of incoming queries or nature of those queries. No
> > particular pattern in errors found in Solr logs.
> >
> > I am going to turn on the debug logs to see what is happening in Solr when
> > it goes down. I am not able to reproduce the issue in one of our non-prod
> > performance testing environments.I am recreating the same traffic as prod
> > using access logs.
> >
> > Any other ideas about how I should go about debugging or reproducing this
> > issue? TIA.
>
> As shipped, if Solr dies, it will NOT restart automatically.  So that
> must have been something you added.
>
> What OS do you have it running on?
>
> If everything is sized correctly, Solr will never crash.  Java programs
> are VERY stable if written correctly and run with plenty of system
> resources.
>
> On non-windows systems, Solr starts with an option that will cause it to
> commit suicide if Java's OutOfMemoryError exception is thrown.  There
> are several resource depletions that can cause OOME, and some of them
> are NOT related to memory.  This capability will not exist on Windows
> until Solr 9.2.0 is released.
>
> https://issues.apache.org/jira/browse/SOLR-8803
>
> Most operating systems have a process that is called an "out of memory
> killer" ... if available memory gets too low, this will find a program
> on the system that is using a lot of memory and terminate it.  On most
> installs, the process using the most memory will be Solr.
>
> I strongly recommend NOT restarting Solr automatically if it ever dies.
> Chances are that the reason it died is because the system needs some
> attention, and restarting it is simply going to result in it dying
> again, over and over.
>
> Thanks,
> Shawn
>

I strongly agree here -- check your system logs for OOM and restarts
from the operating system + systemd (or whatever).
On a lark -- also check the uptime of the entire box.

Reply via email to