On 21/09/2020 08:18, Martin Grigorov wrote: > On Fri, Sep 18, 2020 at 6:16 PM Mark Thomas <ma...@apache.org> wrote: > >> On 18/09/2020 14:07, Martin Grigorov wrote: >> >> <snip/> >> >>> What is the difference >>> between org.apache.coyote.http2.StreamStateMachine.State#CLOSED_RX >>> and org.apache.coyote.http2.StreamStateMachine.State#CLOSED_TX ? >> >> Compare the parameters used to construct the enums. >> >>> I read some parts of https://tools.ietf.org/html/rfc7540 but I don't see >>> anything related to two types of CLOSED state. >> >> Section 5.1. Definition of the closed state (page 20) explains the >> difference between the two. >> > > Still I do not understand what RX and TX stand for. But this is not > important.
TX - transmit RX - receive > The following patch fixes the issue for me/Vegeta: > > @@ -1570,12 +1571,15 @@ class Http2UpgradeHandler extends AbstractStream > implements InternalHttpUpgradeH > > @Override > public void reset(int streamId, long errorCode) throws Http2Exception > { > - Stream stream = getStream(streamId, true); > - boolean active = stream.isActive(); > - stream.checkState(FrameType.RST); > - stream.receiveReset(errorCode); > - if (active) { > - activeRemoteStreamCount.decrementAndGet(); > + boolean unknownIsError = Http2Error.CANCEL.getCode() != errorCode; > + Stream stream = getStream(streamId, unknownIsError); > + if (stream != null) { > + boolean active = stream.isActive(); > + stream.checkState(FrameType.RST); > + stream.receiveReset(errorCode); > + if (active) { > + activeRemoteStreamCount.decrementAndGet(); > + } > } > } > > I.e. do not throw an error if the remote peer is trying to cancel an > already closed stream. RFC 7540 allows the connection to be closed with a protocol error if the reset is received "a significant time after sending END_STREAM". Tomcat retains state for maxConcurrentStreams + 10% with closed streams being pruned (oldest first) every 10 new streams. It isn't clear at this point why the client is sending the RST_STREAM. This normally indicates an error condition. From the description of the load test I don't see why the client would be sending a RST_STREAM. If the RST_STREAM is valid and it isn't received a significant time after the END_STREAM then Tomcat needs to handle this. If the RST_STREAM is valid but received a significant amount of time after the END_STREAM that would be a client error would be a client error. Of course, significant isn't defined in the specification. Any special handling for RST_STREAM also needs to be applied to WINDOW_UPDATE. It also needs to differentiate between a frame received for a past closed stream and a frame received for an stream that never existed. I think the key question here is why is the client sending the RST_STREAM in the first place. Is Tomcat doing something it shouldn't / not doing something it should to cause this? If so, we need to fix the root cause rather than tackle the symptom. There are a couple of options for debugging this to see why the client is sending the RST_STREAM: - Enable debug logging for HTTP/2 in Tomcat. This is very verbose and if the root cause is threading related typically changes the timing enough to mask the issue. - Client side logging (I don't know what the capabilities are here). - JVM level TLS debug logging (even more verbose than Tomcat's HTTP/2 debug and harder to work with) - Wireshark (or similar) I'd recommend Wireshark. I don't think JSSE supports SSLKEYLOGFILE so I'd use the APR/Native connector with SSLKEYLOGFILE to get the cleartext in Wireshark. > With this change and Vegeta's -max-workers=100 I can get 12 K reqs/sec. That is odd. If the client is sending RST_STREAM messages then I'd expect to see requests failing as RST_STREAM indicates something has gone wrong or the client has aborted the request. > With more workers it starts failing with: > > "status_codes": { > "0": 1000, > "200": 1 > }, > "errors": [ > "Get \"https://localhost:18080/testbed/plaintext\": net/http: request > canceled while waiting for connection (Client.Timeout exceeded while > awaiting headers)", > "Get \"https://localhost:18080/testbed/plaintext\": context deadline > exceeded", > "Get \"https://localhost:18080/testbed/plaintext\": context deadline > exceeded (Client.Timeout exceeded while awaiting headers)" > ] > i.e. out of 1001 requests only one succeeds and the others fail with > timeout. I will try to debug this one too and run a profiler too. I'd probably start by enabling Tomcat's HTTP/2 debug logging for this one. I'm wondering if something is triggering a connection level error early in the processing. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org