psalagnac opened a new pull request, #3350: URL: https://github.com/apache/solr/pull/3350
https://issues.apache.org/jira/browse/SOLR-17754 # Description Stuck overseer that sometimes happens under high load, when the overseer has at least 100 running tasks. See [Jira](https://issues.apache.org/jira/browse/SOLR-17754) for the full scenario as description is pretty long. # Solution This fixes the overseer main loop so we never submit more than 100 concurrent tasks to the thread pool. Instead of manually tracking when a task is complete, we check the status using a standard java `Future`. The changes also makes sure we don't write the result to ZK response node when we should not (see [comment](https://issues.apache.org/jira/browse/SOLR-17754?focusedCommentId=17951216&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17951216)), removing the erroneous occurrences of log `"Response ZK path: <node> doesn't exist. Requestor may have disconnected from ZooKeeper"` # Tests Add a new test to make sure we don't fail anymore with lot of tasks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org