[ https://issues.apache.org/jira/browse/SOLR-17419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17878128#comment-17878128 ]
Jason Gerlowski commented on SOLR-17419: ---------------------------------------- I opened a PR for this ticket but accidentally swapped two digits in the JIRA number, so it hasn't been linked here yet. Would love reviews on the code [here|https://github.com/apache/solr/pull/2681] if anyone has a few moments. Rather than trying to improve HttpShardHandler, the linked PR introduces an alternate implementation: ParallelHttpShardHandler[Factory]. The new implementation extends HttpShardHandler[Factory] and reuses most of its code. The only difference: ParallelHttpShardHandler.submit uses the executor to both send the request and await the response. This allows the "submit" call to return sooner, and for SearchHandler (or other callers) to iterate through a series of "submit" calls more quickly. The tradeoff of course is potentially higher CPU utilization, depending on the number of shards in play, etc. ---- I've run some perf tests (details below), and the results look pretty promising in certain scenarios! !shardhandler-perf-graph.png! To summarize: # With auth disabled HttpShardHandler outperforms the new implementation even in collections with many shards. # With auth enabled though, ParallelShardHandler outperforms HttpShardHandler by a factor of nearly "3x" regardless of the number of shards. At first glance, it's surprising that "auth" has such a huge impact. The cause, according to some profiling, is PKI auth. Generating and signing the "PKI" header consumes a massive amount of CPU and drives the cost of HttpShardHandler.submit. Without PKI, request-sending is cheap enough that doing it serially outperforms the thread-creation overhead incurred by the "Parallel" implementation. As soon as PKI is needed though, it dwarfs any thread-creation overhead and ParallelHttpShardHandler wins out. P.S: With PKI being so expensive, should we stress the security.json "forwardCredentials=true" option more strongly in the docs? Or make it a default in the AuthPlugins that support it? *Appendix: Perf Test Details* The results discussed above come from a series of perf experiments I ran on a local 4-node Solr cluster created via "bin/solr start -e cloud -m 2g". "Avg QTime" measurements were taken by (1) creating a collection with a particular 'numShard' value and using a particular 'ShardHandlerFactory', (2) loading in some data, (3) warming the JVM and collection with an initial round of queries, and (4) running 'numQueries' from each of 'numThreads' QueryRunner threads. As a data set, I used emails from our "dev@" list exported from "lists.apache.org". (I have a little script to export and clean up this data that I'm considering saving somewhere - it's been an interesting dataset to work with and might be useful in our benchmark module or elsewhere?) For queries, the perf-test code uses the "/terms" handler to fetch common terms and then creates single-term queries from those. Other details: * *Solr Version:* 10.0.0-SNAPSHOT (a branch based off of 'main', with the commit: 1818841868d70adebde364165d60114f164952de) * *JVM:* openjdk version "17.0.8" 2023-07-18 * *OS:* MacOS Sonoma 14.6.1 * *Hardware:* 2017 iMac Pro, 2.5GHz 14-core Intel Xeon, 128gb ram > Improve HttpShardHandler performance in many-shard collections > -------------------------------------------------------------- > > Key: SOLR-17419 > URL: https://issues.apache.org/jira/browse/SOLR-17419 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: 9.0, 9.6.1 > Reporter: Jason Gerlowski > Priority: Major > Attachments: shardhandler-perf-graph.png > > > In Solr 8, HttpShardHandler sends shard-requests by submitting Callables to > an ExecutorService. As a result, both the "request-sending" and > "response-awaiting" happened asynchronous to the original request-thread. > {code:java} > @Override > public void submit(final ShardRequest sreq, final String shard, final > ModifiableSolrParams params) { > ShardRequestor shardRequestor = new ShardRequestor(sreq, shard, params, > this); // Callable > try { > shardRequestor.init(); > pending.add(completionService.submit(shardRequestor)); > } finally { > shardRequestor.end(); > } > } > {code} > However, in Solr 9.x HttpShardHandler ditched the > ExecutorService/per-request-thread approach in favor of [sending all requests > serially using > "SolrClient.requestAsync"|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java#L163]. > SOLR-14354, which made this change, did this in an effort to avoid > unnecessary thread and CPU context-switching. As Dat described in SOLR-14354: > {quote}after sending a request that thread basically do nothing just waiting > for response from other side. That thread will be swapped out and CPU will > try to handle another thread (this is called context switch, CPU will save > the context of the current thread and switch to another one). When some data > (not all) come back, that thread will be called to parsing these data, then > it will wait until more data come back. So there will be lots of context > switching in CPU. That is quite inefficient > {quote} > This approach comes with a downside though - all the shard requests are sent > serially. If sending each request takes ~1ms, then a user is unlikely to > notice this in their collection with 5 or 10 shards. But the cost here > scales linearly, so in *a collection with 50 shards - this approach would > bake a ~50ms delay into the critical path of every single query!* > This issue is intended to reevaluate whether there's a better way to balance > these concerns. Ideally we can come up with an approach that improves all > scenarios. Lacking that, maybe Solr could choose between one of several > approaches semi-intelligently based on the number of shards or other factors? -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org