[
https://issues.apache.org/jira/browse/SOLR-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019994#comment-18019994
]
David Smiley commented on SOLR-17916:
-------------------------------------
I'm so impressed with this analysis Sanjay! This is deja-vu with some similar
maybe benchmark module issue that I've forgotten about.
I think we *do* want to abort the stream to inform the server that any work it
does is for naught. I'm uncertain the server can realistically know what
happened.
Can we configure the rate control of this or disable it? If not, maybe this
test should use HTTP 1.1 to avoid this? It doesn't seem like this test is
really trying to test HTTP 2 or Jetty. WDYT [~houston] -- you recently updated
this test.
> Jetty 12.0.25 upgrade exposes RST_STREAM burst issue
> ----------------------------------------------------
>
> Key: SOLR-17916
> URL: https://issues.apache.org/jira/browse/SOLR-17916
> Project: Solr
> Issue Type: Bug
> Reporter: Sanjay Dutt
> Priority: Major
>
> After upgrading Jetty from {*}12.0.19 → 12.0.25{*}, the test
> {{DistributedDebugComponentTest.testTolerantSearch}} starts failing.
> The test sets up a query with a deliberately bad shard:
> {code:java}
> String badShard = DEAD_HOST_1 + "/solr/collection1";
> query.set("shards", badShard+ "," + shard2 + "," + shard1);
> for (int i = 0; i < (TEST_NIGHTLY ? 500 : 200); i++) {
> // verify that the request would fail if shards.tolerant=false
> query.set(ShardParams.SHARDS_TOLERANT, "false");
> ignoreException("Connection refused");
> expectThrows(SolrException.class, () -> collection1.query(query));
> // verify that the request would succeed if shards.tolerant=true
> query.set(ShardParams.SHARDS_TOLERANT, "true");
> QueryResponse response = collection1.query(query); // fail here!
> ....
> {code}
> For each iteration, it issues:
> * *shards.tolerant = false* → as expected, the coordinator fails fast
> because one shard is dead.
> * *shards.tolerant = true* → expected to succeed using results from the good
> shard(s), but {*}fails after the Jetty upgrade{*}.
> *Observed behavior*
> * In the non-tolerant branch, {{SearchHandler}} throws early on the shard
> exception.
> * At this point {{HttpShardHandler}} cancels the outstanding async requests
> to the other shards, calling {{future.cancel(true)}} /
> {{{}request.abort(){}}}.
> * That abort translates into *RST_STREAM* frames sent to Jetty.
> * With the loop running hundreds of iterations, these cancels accumulate on
> a single HTTP/2 session.
> * Jetty 12.0.25 enforces stricter HTTP/2 rate control:
> GoAwayFrame\{... enhance_your_calm_error/invalid_rst_stream_frame_rate}
> * Once the rate limit is tripped, the server responds with GOAWAY and closes
> the connection.
> * The subsequent tolerant request then fails, even though at least one shard
> is healthy.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]