[ https://issues.apache.org/jira/browse/SOLR-17348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860248#comment-17860248 ]
Michael Gibney commented on SOLR-17348: --------------------------------------- Interesting -- I gather from what you're saying that this is currently a problem, but it's possible that introducing a change that could further re-order/delay callback processing might exacerbate the existing problem. I wonder if there's any way around this ... like, perhaps only _certain_ callbacks really _need_ to be executed synchronously -- and if we can guarantee that those execute very quickly/efficiently, we could do them directly in-thread -- whereas other callbacks simply need to kick off a process that's fully asynchronous, and these can safely be farmed out to the zkCallback executor (and we could ensure that any potentially long-running callbacks are of the latter, -- asynchronous -- type). I think I stumbled into exactly the trap you're describing -- I saw that there's no guarantee of ordered processing atm, I assumed that this was ok; when in fact the unordered processing is just an artifact of the unorthodox way that zk callbacks are currently employed. Does this make sense, based on what you're saying? > Mitigate extreme parallelism of zkCallback executor > --------------------------------------------------- > > Key: SOLR-17348 > URL: https://issues.apache.org/jira/browse/SOLR-17348 > Project: Solr > Issue Type: Improvement > Reporter: Michael Gibney > Priority: Minor > > zkCallback executor is [currently an unbounded thread pool of core size > 0|https://github.com/apache/solr/blob/709a1ee27df23b419d09fe8f67c3276409131a4a/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/SolrZkClient.java#L91-L92], > using a SynchronousQueue. Thus, a flood of zkCallback events (as might be > triggered by a cluster restart, e.g.) can result in spinning up a very large > number of threads. In practice we have encountered as many as 35k threads > created in some such cases, even after the impact of this situation was > reduced by the fix for SOLR-11535. > Inspired by [~cpoerschke]'s recent [closer look at thread pool > behavior|https://issues.apache.org/jira/browse/SOLR-13350?focusedCommentId=17853178#comment-17853178], > I wondered if we might be able to employ a bounded queue to alleviate some > of the pressure from bursty zk callbacks. > The new config might look something like: {{corePoolSize=1024, > maximumPoolSize=Integer.MAX_VALUE, allowCoreThreadTimeout=true, workQueue=new > LinkedBlockingQueue<>(1024)}}. This would allow the pool to grow up to (and > shrink from) corePoolSize in the same manner it currently does, but once > exceeding corePoolSize (e.g. during a cluster restart or other callback flood > event), tasks would be queued (up to some fixed limit). If the queue limit is > exceeded, new threads would still be created, but we would have avoided the > current “always create a thread” behavior, and by so doing hopefully reduce > task execution time and improve overall throughput. > From the ThreadPoolExecutor javadocs: > {quote}Direct handoffs. A good default choice for a work queue is a > SynchronousQueue that hands off tasks to threads without otherwise holding > them. Here, an attempt to queue a task will fail if no threads are > immediately available to run it, so a new thread will be constructed. This > policy avoids lockups when handling sets of requests that might have internal > dependencies. Direct handoffs generally require unbounded maximumPoolSizes to > avoid rejection of new submitted tasks. This in turn admits the possibility > of unbounded thread growth when commands continue to arrive on average faster > than they can be processed.{quote} > So afaict SynchronousQueue mainly makes sense if there exists the possibility > of deadlock due to dependencies among tasks, and I think this should ideally > _not_ be the case with zk callbacks (though in practice I'm not sure this is > the case). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org