iamsanjay commented on PR #3616:
URL: https://github.com/apache/solr/pull/3616#issuecomment-3280617027
```
String badShard = DEAD_HOST_1 + "/solr/collection1";
query.set("shards", badShard+ "," + shard2 + "," + shard1);
for (int i = 0; i < (TEST_NIGHTLY ? 500 : 200); i++) {
// verify that the request would fail if shards.tolerant=false
query.set(ShardParams.SHARDS_TOLERANT, "false");
ignoreException("Connection refused");
expectThrows(SolrException.class, () -> collection1.query(query));
// verify that the request would succeed if shards.tolerant=true
query.set(ShardParams.SHARDS_TOLERANT, "true");
QueryResponse response = collection1.query(query); // fail here!
....
```
Inside the DistributedDebugComponentTest, the first request is sent out
using the shard_tolerant=false, because one of the shard is already in bad
state, the SearchHandler checking on condition isTolerant throws exception, I
believe at this moment/before HttpShardHandler right away cancel all the other
requests that has been sent out to other shard, ended up sending RST_STREAM to
server,
```
2942 DEBUG (qtp1356540870-40) [n: c: s: r: x: t:] o.e.j.h.HTTP2Session
Received ResetFrame@57150ba5#359{cancel_stream_error} for
HTTP2Stream#359@542c9e88{sendWindow=8388608,recvWindow=524006,queue=1,demand=false,reset=false/false,NOT_CLOSED,age=0,request=POST{u=http://127.0.0.1:62712/solr/collection2/select,HTTP/2.0,h=4,cl=282,p=null},attachment=org.eclipse.jetty.http2.server.internal.HttpStreamOverHTTP2@97edce5}
on
HTTP2ServerSession@542c9e88{local:/127.0.0.1:62712,remote:/127.0.0.1:62720,sendWindow=16716804,recvWindow=998801,state=[streams=1,NOT_CLOSED,goAwayRecv=null,goAwaySent=null,failure=null]}
```
eventually hitting the ceiling on rateControl. And then server respond back
with the error which says
```
2969 DEBUG (qtp1356540870-63-null-134) [n: c: s: r: x:collection2
t:null-134] o.e.j.h.BufferingFlowControlStrategy Data consumed, 282 bytes,
session recv window level 53103/524288 for
HTTP2ServerSession@542c9e88{local:/127.0.0.1:62712,remote:/127.0.0.1:62720,sendWindow=16712776,recvWindow=995473,state=[streams=1,CLOSING,goAwayRecv=null,goAwaySent=GoAwayFrame@10b29f3b{383/enhance_your_calm_error/invalid_rst_stream_frame_rate},failure=java.io.IOException:
enhance_your_calm_error/invalid_rst_stream_frame_rate]}
```
And then subsequent request fails with shard_tolerant=true even we still
have one shard in good state. In my opinon, upgrade is not the problem here but
either test has to be modified so that it can go easy or the way we cancelling
the AsyncRequest rather than right away aborting it, we may have to find some
better way to handle it: like quitely consuming the stream.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]