[ 
https://issues.apache.org/jira/browse/SOLR-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17742036#comment-17742036
 ] 

Ovidiu Mihalcea commented on SOLR-8803:
---------------------------------------

In my experience, after running a dockerized SolrCloud cluster with 64+ nodes 
in production, OOM can occur without any noticeable/fixable root causes after 
certain periods of time (weeks, months). Now, if this happens under high load, 
the creation of log files and core dumps does not help in any way.

Also, just restarting the container solves the problem in 99% of the cases, 
without any further problems. But the creation of the log files and core dumps 
just delays the container restart which leads to less nodes handling the high 
load, which in turn leads to a cluster wide crash.

I think we should re-implement the OOM=(script|exit|crash|none) environment 
variable as it should be a user choice, not a hard-coded parameter.

Should I make a PR for this?

> Generalize OOME handling to work for any OS
> -------------------------------------------
>
>                 Key: SOLR-8803
>                 URL: https://issues.apache.org/jira/browse/SOLR-8803
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 9.0
>            Reporter: Binoy Dalal
>            Assignee: Shawn Heisey
>            Priority: Minor
>              Labels: OOM, oom
>             Fix For: main (10.0), 9.2
>
>         Attachments: SOLR-8803-1.patch, SOLR-8803-10.patch, 
> SOLR-8803-2.patch, SOLR-8803-3.patch, SOLR-8803-4.patch, SOLR-8803-5.patch, 
> SOLR-8803-6.patch, SOLR-8803-7.patch, SOLR-8803-8.patch, SOLR-8803-9.patch, 
> SOLR-8803.patch, oom_win.cmd, solr-8803-build-transcript.txt
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Solr on windows does not currently have a script to kill the process on OOM 
> errors.
> The idea is to write a batch script that works like the OOM kill script for 
> Linux and kills the solr process on OOM errors while creating an OOM log file 
> like the one on Linux systems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to