wrunderwood commented on PR #96: URL: https://github.com/apache/solr/pull/96#issuecomment-1699585220
@janhoy Thanks so much for picking this up. I'll take a look at the compromise. This is a tricky decision, changing behavior to match the documentation. Is it a breaking change or not? How many people read the code and figured out it wasn't CPU? Anybody? Our clusters with 300+ shards were configured assuming it was CPU usage (max 100%) until I explained otherwise. I'd like to add a note to the docs about using circuit breakers in a sharded system, because they multiply failures. For example, with 4 shards, if 10% of search requests are short-circuited on all nodes, the end user will see about a 1/3 failure rate. In a sharded system, it is probably worth enabling partial results to avoid that. A future feature would be an option to only check the circuit breakers on the initial external request, not on the distributed requests to shards. That has advantages (no partial failures) and disadvantages (can't reject the portion of the load which is intra-cluster). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org