On Mon, Sep 21, 2020 at 11:23 AM Mark Thomas <ma...@apache.org> wrote:
> On 21/09/2020 08:18, Martin Grigorov wrote: > > On Fri, Sep 18, 2020 at 6:16 PM Mark Thomas <ma...@apache.org> wrote: > > > >> On 18/09/2020 14:07, Martin Grigorov wrote: > >> > >> <snip/> > >> > >>> What is the difference > >>> between org.apache.coyote.http2.StreamStateMachine.State#CLOSED_RX > >>> and org.apache.coyote.http2.StreamStateMachine.State#CLOSED_TX ? > >> > >> Compare the parameters used to construct the enums. > >> > >>> I read some parts of https://tools.ietf.org/html/rfc7540 but I don't > see > >>> anything related to two types of CLOSED state. > >> > >> Section 5.1. Definition of the closed state (page 20) explains the > >> difference between the two. > >> > > > > Still I do not understand what RX and TX stand for. But this is not > > important. > > TX - transmit > > RX - receive > Thanks! > > > > The following patch fixes the issue for me/Vegeta: > > > > @@ -1570,12 +1571,15 @@ class Http2UpgradeHandler extends AbstractStream > > implements InternalHttpUpgradeH > > > > @Override > > public void reset(int streamId, long errorCode) throws > Http2Exception > > { > > - Stream stream = getStream(streamId, true); > > - boolean active = stream.isActive(); > > - stream.checkState(FrameType.RST); > > - stream.receiveReset(errorCode); > > - if (active) { > > - activeRemoteStreamCount.decrementAndGet(); > > + boolean unknownIsError = Http2Error.CANCEL.getCode() != > errorCode; > > + Stream stream = getStream(streamId, unknownIsError); > > + if (stream != null) { > > + boolean active = stream.isActive(); > > + stream.checkState(FrameType.RST); > > + stream.receiveReset(errorCode); > > + if (active) { > > + activeRemoteStreamCount.decrementAndGet(); > > + } > > } > > } > > > > I.e. do not throw an error if the remote peer is trying to cancel an > > already closed stream. > > RFC 7540 allows the connection to be closed with a protocol error if the > reset is received "a significant time after sending END_STREAM". > > Tomcat retains state for maxConcurrentStreams + 10% with closed streams > being pruned (oldest first) every 10 new streams. > The spec talks about time but Tomcat uses the number of streams. I understand that under normal usage this could be enough "time" but under heavy load this number is reached quickly and the time is short. I've tried with these values for maxConcurrentStreams/maxConcurrentStreamExecution: 8 (number of CPU cores), 16, 100 (the default), 1024 and 8196. The throughput is around 10K only until 100. After that it drops (as you said earlier). > > It isn't clear at this point why the client is sending the RST_STREAM. > I don't know either. But Vegeta is using the Golang standard libraries, so any Golang application should behave the same way. Also as I investigated last week - Golang/Node.js/Rust/Netty HTTP2 servers do not complain about it. > This normally indicates an error condition. From the description of the > load test I don't see why the client would be sending a RST_STREAM. > > If the RST_STREAM is valid and it isn't received a significant time > after the END_STREAM then Tomcat needs to handle this. > > If the RST_STREAM is valid but received a significant amount of time > after the END_STREAM that would be a client error would be a client error. > > Of course, significant isn't defined in the specification. > > Any special handling for RST_STREAM also needs to be applied to > WINDOW_UPDATE. It also needs to differentiate between a frame received > for a past closed stream and a frame received for an stream that never > existed. > > I think the key question here is why is the client sending the > RST_STREAM in the first place. Is Tomcat doing something it shouldn't / > not doing something it should to cause this? If so, we need to fix the > root cause rather than tackle the symptom. > > There are a couple of options for debugging this to see why the client > is sending the RST_STREAM: > - Enable debug logging for HTTP/2 in Tomcat. This is very verbose and > if the root cause is threading related typically changes the timing > enough to mask the issue. > Here are the logs of 20 seconds load: https://gist.githubusercontent.com/martin-g/37dacc12149d5cecbfd4e16dc2248cfd/raw/76693cd1dd1f37ba126edb84ebd7631802aa91a6/tomcat-http2.log This is with my patch in #reset(). Without it there are also errors that stream XYZ is already closed > - Client side logging (I don't know what the capabilities are here). > - JVM level TLS debug logging (even more verbose than Tomcat's HTTP/2 > debug and harder to work with) > - Wireshark (or similar) > > I'd recommend Wireshark. I don't think JSSE supports SSLKEYLOGFILE so > I'd use the APR/Native connector with SSLKEYLOGFILE to get the cleartext > in Wireshark. > > > With this change and Vegeta's -max-workers=100 I can get 12 K reqs/sec. > > That is odd. If the client is sending RST_STREAM messages then I'd > expect to see requests failing as RST_STREAM indicates something has > gone wrong or the client has aborted the request. > According to https://tools.ietf.org/html/rfc7540#section-5.1 when the stream is in HALF_CLOSED_REMOTE state it should send either END_STREAM or RST_STREAM to the remote peer. In the logs I see such sequence of events: ... o.a.coyote.http2.StreamStateMachine : Connection [124], Stream [5,279], State changed from [OPEN] to [HALF_CLOSED_REMOTE] o.a.coyote.http2.Http2UpgradeHandler : Connection [124], Stream [5,279], Writing the headers org.apache.coyote.http2.Stream : Connection [124], Stream [5,279], flushing output with buffer at position [12], writeInProgress [false] and closed [true] org.apache.coyote.http2.AbstractStream : Connection [124], Stream [5,279], reduce flow control window by [12] to [4194292] o.a.coyote.http2.Http2UpgradeHandler : Connection [124], Stream [5,279], Data length [12] o.a.coyote.http2.StreamStateMachine : Connection [124], Stream [5,279], State changed from [HALF_CLOSED_REMOTE] to [CLOSED_TX] org.apache.coyote.http2.Stream : Connection [124], Stream [5,279] has been recycled >From the logs it is not clear whether any of these flags is being sent to the client. I will investigate! Note: ", Writing the headers" is added by me. The current log entry is "upgradeHandler.writeHeaders=Connection [{0}], Stream [{1}]" > > > With more workers it starts failing with: > > > > "status_codes": { > > "0": 1000, > > "200": 1 > > }, > > "errors": [ > > "Get \"https://localhost:18080/testbed/plaintext\": net/http: > request > > canceled while waiting for connection (Client.Timeout exceeded while > > awaiting headers)", > > "Get \"https://localhost:18080/testbed/plaintext\": context deadline > > exceeded", > > "Get \"https://localhost:18080/testbed/plaintext\": context deadline > > exceeded (Client.Timeout exceeded while awaiting headers)" > > ] > > i.e. out of 1001 requests only one succeeds and the others fail with > > timeout. I will try to debug this one too and run a profiler too. > > I'd probably start by enabling Tomcat's HTTP/2 debug logging for this > one. I'm wondering if something is triggering a connection level error > early in the processing. > > Mark > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >