Sanjay Dutt created SOLR-17916:
----------------------------------

             Summary: Jetty 12.0.25 upgrade exposes RST_STREAM burst issue
                 Key: SOLR-17916
                 URL: https://issues.apache.org/jira/browse/SOLR-17916
             Project: Solr
          Issue Type: Bug
            Reporter: Sanjay Dutt


After upgrading Jetty from {*}12.0.19 → 12.0.25{*}, the test 
{{DistributedDebugComponentTest.testTolerantSearch}} starts failing.

The test sets up a query with a deliberately bad shard:
{code:java}
String badShard = DEAD_HOST_1 + "/solr/collection1";
query.set("shards", badShard+ "," + shard2 + "," + shard1);
for (int i = 0; i < (TEST_NIGHTLY ? 500 : 200); i++) {
      // verify that the request would fail if shards.tolerant=false
      query.set(ShardParams.SHARDS_TOLERANT, "false");
      ignoreException("Connection refused");
      expectThrows(SolrException.class, () -> collection1.query(query));
      // verify that the request would succeed if shards.tolerant=true
      query.set(ShardParams.SHARDS_TOLERANT, "true");
      QueryResponse response = collection1.query(query); // fail here!
....
{code}
For each iteration, it issues:
 * *shards.tolerant = false* → as expected, the coordinator fails fast because 
one shard is dead.

 * *shards.tolerant = true* → expected to succeed using results from the good 
shard(s), but {*}fails after the Jetty upgrade{*}.

*Observed behavior*
 * In the non-tolerant branch, {{SearchHandler}} throws early on the shard 
exception.

 * At this point {{HttpShardHandler}} cancels the outstanding async requests to 
the other shards, calling {{future.cancel(true)}} / {{{}request.abort(){}}}.

 * That abort translates into *RST_STREAM* frames sent to Jetty.

 * With the loop running hundreds of iterations, these cancels accumulate on a 
single HTTP/2 session.

 * Jetty 12.0.25 enforces stricter HTTP/2 rate control:
GoAwayFrame\{... enhance_your_calm_error/invalid_rst_stream_frame_rate}
 * Once the rate limit is tripped, the server responds with GOAWAY and closes 
the connection.
 * The subsequent tolerant request then fails, even though at least one shard 
is healthy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to