Hi, For a solr cloud, is there a setting that allows a core to proactively go down if its able to detect some temporary issues like high GC, high thread-counts, temporary network slow down etc. ? Currently we see that a node gets in a distributed deadlock because its not able to detect such situations.
I am exploring Solr code to see if its possible to take some proactive action in such cases. One way could be to have configurable limits for GC time, thread-count, response-time, 5-minute-rate etc. and make a core shut down if it senses problems. Once that happens, a background thread will monitor the trouble causing parameters and recover the downed core when situation improves. My current patch can bring down a core for: 1) High thread-counts, 2) High 95thPcRequestTime, 3) Huge # of heavy queries in a given time. The patch also recovers the core when its health improves. If the above seems doable, then I can create a JIRA for more discussion and implementation. Thanks Sachin
