Hi,

For a solr cloud, is there a setting that allows a core to proactively go
down if its able to detect some temporary issues like high GC, high
thread-counts, temporary network slow down etc. ?
Currently we see that a node gets in a distributed deadlock because its not
able to detect such situations.

I am exploring Solr code to see if its possible to take some proactive
action in such cases.
One way could be to have configurable limits for GC time, thread-count,
response-time, 5-minute-rate etc. and make a core shut down if it senses
problems.
Once that happens, a background thread will monitor the trouble causing
parameters and recover the downed core when situation improves.


My current patch can bring down a core for:
1) High thread-counts,
2) High 95thPcRequestTime,
3) Huge # of heavy queries in a given time.

The patch also recovers the core when its health improves.


If the above seems doable, then I can create a JIRA for more discussion and
implementation.


Thanks
Sachin

Reply via email to