gus-asf commented on code in PR #2666: URL: https://github.com/apache/solr/pull/2666#discussion_r1777385925
########## solr/core/src/java/org/apache/solr/handler/component/ParallelHttpShardHandler.java: ########## @@ -41,57 +39,78 @@ @NotThreadSafe public class ParallelHttpShardHandler extends HttpShardHandler { + @SuppressWarnings("unused") private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); private final ExecutorService commExecutor; + /* + * Unlike the basic HttpShardHandler, this class allows us to exit submit before + * pending is incremented and the responseFutureMap is updated. If the runnables that Review Comment: The concurrency fixes here and changes in HttpShardHandler come from me needing to implement this and being worried about the effects of limits interrupting/canceling requests. I felt compelled to ensure that canceling futures from within the call back (to facilitate a faster short circuit once we know the request is hitting a limit) wouldn't cause concurrency issues. ... so I put effort into understanding exactly what's going on and what the controls are. I found that (before this ticket) there was reliance on the orderly update of pending, a map of futures and the request queue object. Converting the map to a synchronized object didn't seem sufficient since the logic relies on coordination among objects... so essentially I had to make sure everything was going to handle a new short circuit point initiated by the callback and concurrency became "on-topic"... At a high level this ticket works by letting the callbacks [signal to stop all other callbacks](https://github.com/apache/solr/pull/2666/files#diff-ce2580113ae9a9d50f308985229130f094868e83d35a8727c5683e6a2f3f44daR543) if one of them is failing and partial results are not desired. There's another opportunity to quit early in the take() method but that caused too many problems (rare test failures likely due to further introduced concurrency issues). I do suspect some of this could be simplified, particularly what I note here: https://github.com/apache/solr/pull/2666/files#diff-ce2580113ae9a9d50f308985229130f094868e83d35a8727c5683e6a2f3f44daR98 But I decided to only guard and de-duplicate the existing logic. I'm leaving revising the logic to a later effort in order to control scope. In the case of your class (which came in as a merge), this was the simplest thing I could think of (as in simple to understand, not simple/elegant). Basically we need to know that everything we scheduled has been attempted (fail or not) before we let the loop that takes responses exit or we might miss one, but we also have to ensure that a failure won't leave us looping forever. So the quick solution was to track what's started, and track what's been attempted, and make sure we don't quit waiting for responses unless they are equal. Note that attempts may not result in a a future if `this.lbClient.requestAsync(lbReq);` fails, so the count of futures is not good enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org