[
https://issues.apache.org/jira/browse/SOLR-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18020803#comment-18020803
]
Houston Putman commented on SOLR-17916:
---------------------------------------
[~sanjaydutt] I did the exact same thing when fixing this test before. Comment
out the request abort. Because it would bleed over to other (unrelated)
requests in the Solr 9 branch. When testing, it didn't happen in the main
branch, so I left it.
I guess we were using a more "up-to-date" jetty in 9.x at the time. But that
ticket was: SOLR-17819.
I say we comment it out, because it can only cause issues for us going forward.
It's much better to leave the request running than have 1 bad request start
cancelling other requests that are in flight.
> Jetty 12.0.25 upgrade exposes RST_STREAM burst issue
> ----------------------------------------------------
>
> Key: SOLR-17916
> URL: https://issues.apache.org/jira/browse/SOLR-17916
> Project: Solr
> Issue Type: Bug
> Reporter: Sanjay Dutt
> Priority: Major
>
> After upgrading Jetty from {*}12.0.19 → 12.0.25{*}, the test
> {{DistributedDebugComponentTest.testTolerantSearch}} starts failing.
> The test sets up a query with a deliberately bad shard:
> {code:java}
> String badShard = DEAD_HOST_1 + "/solr/collection1";
> query.set("shards", badShard+ "," + shard2 + "," + shard1);
> for (int i = 0; i < (TEST_NIGHTLY ? 500 : 200); i++) {
> // verify that the request would fail if shards.tolerant=false
> query.set(ShardParams.SHARDS_TOLERANT, "false");
> ignoreException("Connection refused");
> expectThrows(SolrException.class, () -> collection1.query(query));
> // verify that the request would succeed if shards.tolerant=true
> query.set(ShardParams.SHARDS_TOLERANT, "true");
> QueryResponse response = collection1.query(query); // fail here!
> ....
> {code}
> For each iteration, it issues:
> * *shards.tolerant = false* → as expected, the coordinator fails fast
> because one shard is dead.
> * *shards.tolerant = true* → expected to succeed using results from the good
> shard(s), but {*}fails after the Jetty upgrade{*}.
> *Observed behavior*
> * In the non-tolerant branch, {{SearchHandler}} throws early on the shard
> exception.
> * At this point {{HttpShardHandler}} cancels the outstanding async requests
> to the other shards, calling {{future.cancel(true)}} /
> {{{}request.abort(){}}}.
> * That abort translates into *RST_STREAM* frames sent to Jetty.
> * With the loop running hundreds of iterations, these cancels accumulate on
> a single HTTP/2 session.
> * Jetty 12.0.25 enforces stricter HTTP/2 rate control:
> GoAwayFrame\{... enhance_your_calm_error/invalid_rst_stream_frame_rate}
> * Once the rate limit is tripped, the server responds with GOAWAY and closes
> the connection.
> * The subsequent tolerant request then fails, even though at least one shard
> is healthy.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]