[ 
https://issues.apache.org/jira/browse/SOLR-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019994#comment-18019994
 ] 

David Smiley commented on SOLR-17916:
-------------------------------------

I'm so impressed with this analysis Sanjay!  This is deja-vu with some similar 
maybe benchmark module issue that I've forgotten about.

I think we *do* want to abort the stream to inform the server that any work it 
does is for naught.  I'm uncertain the server can realistically know what 
happened.

Can we configure the rate control of this or disable it?  If not, maybe this 
test should use HTTP 1.1 to avoid this?  It doesn't seem like this test is 
really trying to test HTTP 2 or Jetty.  WDYT [~houston] -- you recently updated 
this test.

> Jetty 12.0.25 upgrade exposes RST_STREAM burst issue
> ----------------------------------------------------
>
>                 Key: SOLR-17916
>                 URL: https://issues.apache.org/jira/browse/SOLR-17916
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Sanjay Dutt
>            Priority: Major
>
> After upgrading Jetty from {*}12.0.19 → 12.0.25{*}, the test 
> {{DistributedDebugComponentTest.testTolerantSearch}} starts failing.
> The test sets up a query with a deliberately bad shard:
> {code:java}
> String badShard = DEAD_HOST_1 + "/solr/collection1";
> query.set("shards", badShard+ "," + shard2 + "," + shard1);
> for (int i = 0; i < (TEST_NIGHTLY ? 500 : 200); i++) {
>       // verify that the request would fail if shards.tolerant=false
>       query.set(ShardParams.SHARDS_TOLERANT, "false");
>       ignoreException("Connection refused");
>       expectThrows(SolrException.class, () -> collection1.query(query));
>       // verify that the request would succeed if shards.tolerant=true
>       query.set(ShardParams.SHARDS_TOLERANT, "true");
>       QueryResponse response = collection1.query(query); // fail here!
> ....
> {code}
> For each iteration, it issues:
>  * *shards.tolerant = false* → as expected, the coordinator fails fast 
> because one shard is dead.
>  * *shards.tolerant = true* → expected to succeed using results from the good 
> shard(s), but {*}fails after the Jetty upgrade{*}.
> *Observed behavior*
>  * In the non-tolerant branch, {{SearchHandler}} throws early on the shard 
> exception.
>  * At this point {{HttpShardHandler}} cancels the outstanding async requests 
> to the other shards, calling {{future.cancel(true)}} / 
> {{{}request.abort(){}}}.
>  * That abort translates into *RST_STREAM* frames sent to Jetty.
>  * With the loop running hundreds of iterations, these cancels accumulate on 
> a single HTTP/2 session.
>  * Jetty 12.0.25 enforces stricter HTTP/2 rate control:
> GoAwayFrame\{... enhance_your_calm_error/invalid_rst_stream_frame_rate}
>  * Once the rate limit is tripped, the server responds with GOAWAY and closes 
> the connection.
>  * The subsequent tolerant request then fails, even though at least one shard 
> is healthy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to