[
https://issues.apache.org/jira/browse/SOLR-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019810#comment-18019810
]
Sanjay Dutt commented on SOLR-17916:
------------------------------------
This is where the client aborts the request, inside the
Http2SolrClient#requestAsync. If I comment it, then client never sends
RST_STREAM and test runs smoothly.
{code:java}
future.exceptionally(
(error) -> {
mrrv.request.abort(error);
return null;
});{code}
> Jetty 12.0.25 upgrade exposes RST_STREAM burst issue
> ----------------------------------------------------
>
> Key: SOLR-17916
> URL: https://issues.apache.org/jira/browse/SOLR-17916
> Project: Solr
> Issue Type: Bug
> Reporter: Sanjay Dutt
> Priority: Major
>
> After upgrading Jetty from {*}12.0.19 → 12.0.25{*}, the test
> {{DistributedDebugComponentTest.testTolerantSearch}} starts failing.
> The test sets up a query with a deliberately bad shard:
> {code:java}
> String badShard = DEAD_HOST_1 + "/solr/collection1";
> query.set("shards", badShard+ "," + shard2 + "," + shard1);
> for (int i = 0; i < (TEST_NIGHTLY ? 500 : 200); i++) {
> // verify that the request would fail if shards.tolerant=false
> query.set(ShardParams.SHARDS_TOLERANT, "false");
> ignoreException("Connection refused");
> expectThrows(SolrException.class, () -> collection1.query(query));
> // verify that the request would succeed if shards.tolerant=true
> query.set(ShardParams.SHARDS_TOLERANT, "true");
> QueryResponse response = collection1.query(query); // fail here!
> ....
> {code}
> For each iteration, it issues:
> * *shards.tolerant = false* → as expected, the coordinator fails fast
> because one shard is dead.
> * *shards.tolerant = true* → expected to succeed using results from the good
> shard(s), but {*}fails after the Jetty upgrade{*}.
> *Observed behavior*
> * In the non-tolerant branch, {{SearchHandler}} throws early on the shard
> exception.
> * At this point {{HttpShardHandler}} cancels the outstanding async requests
> to the other shards, calling {{future.cancel(true)}} /
> {{{}request.abort(){}}}.
> * That abort translates into *RST_STREAM* frames sent to Jetty.
> * With the loop running hundreds of iterations, these cancels accumulate on
> a single HTTP/2 session.
> * Jetty 12.0.25 enforces stricter HTTP/2 rate control:
> GoAwayFrame\{... enhance_your_calm_error/invalid_rst_stream_frame_rate}
> * Once the rate limit is tripped, the server responds with GOAWAY and closes
> the connection.
> * The subsequent tolerant request then fails, even though at least one shard
> is healthy.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]