On Mon, Sep 21, 2020 at 12:08 PM Martin Grigorov <mgrigo...@apache.org> wrote:
> > > On Mon, Sep 21, 2020 at 11:23 AM Mark Thomas <ma...@apache.org> wrote: > >> On 21/09/2020 08:18, Martin Grigorov wrote: >> > On Fri, Sep 18, 2020 at 6:16 PM Mark Thomas <ma...@apache.org> wrote: >> > >> >> On 18/09/2020 14:07, Martin Grigorov wrote: >> >> >> >> <snip/> >> >> >> >>> What is the difference >> >>> between org.apache.coyote.http2.StreamStateMachine.State#CLOSED_RX >> >>> and org.apache.coyote.http2.StreamStateMachine.State#CLOSED_TX ? >> >> >> >> Compare the parameters used to construct the enums. >> >> >> >>> I read some parts of https://tools.ietf.org/html/rfc7540 but I don't >> see >> >>> anything related to two types of CLOSED state. >> >> >> >> Section 5.1. Definition of the closed state (page 20) explains the >> >> difference between the two. >> >> >> > >> > Still I do not understand what RX and TX stand for. But this is not >> > important. >> >> TX - transmit >> >> RX - receive >> > > Thanks! > > >> >> >> > The following patch fixes the issue for me/Vegeta: >> > >> > @@ -1570,12 +1571,15 @@ class Http2UpgradeHandler extends AbstractStream >> > implements InternalHttpUpgradeH >> > >> > @Override >> > public void reset(int streamId, long errorCode) throws >> Http2Exception >> > { >> > - Stream stream = getStream(streamId, true); >> > - boolean active = stream.isActive(); >> > - stream.checkState(FrameType.RST); >> > - stream.receiveReset(errorCode); >> > - if (active) { >> > - activeRemoteStreamCount.decrementAndGet(); >> > + boolean unknownIsError = Http2Error.CANCEL.getCode() != >> errorCode; >> > + Stream stream = getStream(streamId, unknownIsError); >> > + if (stream != null) { >> > + boolean active = stream.isActive(); >> > + stream.checkState(FrameType.RST); >> > + stream.receiveReset(errorCode); >> > + if (active) { >> > + activeRemoteStreamCount.decrementAndGet(); >> > + } >> > } >> > } >> > >> > I.e. do not throw an error if the remote peer is trying to cancel an >> > already closed stream. >> >> RFC 7540 allows the connection to be closed with a protocol error if the >> reset is received "a significant time after sending END_STREAM". >> >> Tomcat retains state for maxConcurrentStreams + 10% with closed streams >> being pruned (oldest first) every 10 new streams. >> > > The spec talks about time but Tomcat uses the number of streams. > I understand that under normal usage this could be enough "time" but under > heavy load this number is reached quickly and the time is short. > > I've tried with these values for > maxConcurrentStreams/maxConcurrentStreamExecution: 8 (number of CPU cores), > 16, 100 (the default), 1024 and 8196. The throughput is around 10K only > until 100. After that it drops (as you said earlier). > > >> >> It isn't clear at this point why the client is sending the RST_STREAM. >> > > I don't know either. But Vegeta is using the Golang standard libraries, so > any Golang application should behave the same way. > Also as I investigated last week - Golang/Node.js/Rust/Netty HTTP2 servers > do not complain about it. > > >> This normally indicates an error condition. From the description of the >> load test I don't see why the client would be sending a RST_STREAM. >> >> If the RST_STREAM is valid and it isn't received a significant time >> after the END_STREAM then Tomcat needs to handle this. >> >> If the RST_STREAM is valid but received a significant amount of time >> after the END_STREAM that would be a client error would be a client error. >> >> Of course, significant isn't defined in the specification. >> >> Any special handling for RST_STREAM also needs to be applied to >> WINDOW_UPDATE. It also needs to differentiate between a frame received >> for a past closed stream and a frame received for an stream that never >> existed. >> >> I think the key question here is why is the client sending the >> RST_STREAM in the first place. Is Tomcat doing something it shouldn't / >> not doing something it should to cause this? If so, we need to fix the >> root cause rather than tackle the symptom. >> >> There are a couple of options for debugging this to see why the client >> is sending the RST_STREAM: >> - Enable debug logging for HTTP/2 in Tomcat. This is very verbose and >> if the root cause is threading related typically changes the timing >> enough to mask the issue. >> > > Here are the logs of 20 seconds load: > > https://gist.githubusercontent.com/martin-g/37dacc12149d5cecbfd4e16dc2248cfd/raw/76693cd1dd1f37ba126edb84ebd7631802aa91a6/tomcat-http2.log > Actually the above was just the last logs. The previous logs were in .gz files which I've missed. https://gist.githubusercontent.com/martin-g/37dacc12149d5cecbfd4e16dc2248cfd/raw/00596b7dc31d30d275384334ff17c5416f5b363c/tomcat-http2.log - this file contains the first 7 seconds of the load run. I have 3 times more information for the last 13 seconds but it is more or less the same information. There are many like this: 2020-09-21 14:25:04.850 DEBUG 232086 --- [https-jsse-nio-18080-exec-8] o.a.coyote.http11.Http11NioProtocol : Found processor [null] for socket [org.apache.tomcat.util.net.SecureNioChannel@2b435926:java.nio.channels.SocketChannel[connected local=/127.0.0.1:18080 remote=/127.0.0.1:42229]] 2020-09-21 14:25:04.850 DEBUG 232086 --- [https-jsse-nio-18080-exec-6] o.a.coyote.http2.Http2UpgradeHandler : Connection [2] java.io.IOException: Unable to unwrap data, invalid status [CLOSED] at org.apache.tomcat.util.net.SecureNioChannel.read(SecureNioChannel.java:766) at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper$NioOperationState.run(NioEndpoint.java:1468) at org.apache.tomcat.util.net.SocketWrapperBase$OperationState.start(SocketWrapperBase.java:1039) at org.apache.tomcat.util.net.SocketWrapperBase.vectoredOperation(SocketWrapperBase.java:1450) at org.apache.tomcat.util.net.SocketWrapperBase.read(SocketWrapperBase.java:1293) at org.apache.tomcat.util.net.SocketWrapperBase.read(SocketWrapperBase.java:1265) at org.apache.coyote.http2.Http2AsyncParser.readConnectionPreface(Http2AsyncParser.java:55) at org.apache.coyote.http2.Http2UpgradeHandler.init(Http2UpgradeHandler.java:238) at org.apache.coyote.http2.Http2AsyncUpgradeHandler.init(Http2AsyncUpgradeHandler.java:39) at org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch(Http2UpgradeHandler.java:313) at org.apache.coyote.http2.Http2AsyncUpgradeHandler.upgradeDispatch(Http2AsyncUpgradeHandler.java:39) at org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:54) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:59) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.base/java.lang.Thread.run(Thread.java:832) > > > This is with my patch in #reset(). Without it there are also errors that > stream XYZ is already closed > > >> - Client side logging (I don't know what the capabilities are here). >> - JVM level TLS debug logging (even more verbose than Tomcat's HTTP/2 >> debug and harder to work with) >> - Wireshark (or similar) >> >> I'd recommend Wireshark. I don't think JSSE supports SSLKEYLOGFILE so >> I'd use the APR/Native connector with SSLKEYLOGFILE to get the cleartext >> in Wireshark. >> >> > With this change and Vegeta's -max-workers=100 I can get 12 K reqs/sec. >> >> That is odd. If the client is sending RST_STREAM messages then I'd >> expect to see requests failing as RST_STREAM indicates something has >> gone wrong or the client has aborted the request. >> > > According to https://tools.ietf.org/html/rfc7540#section-5.1 when the > stream is in HALF_CLOSED_REMOTE state it should send either END_STREAM or > RST_STREAM to the remote peer. > In the logs I see such sequence of events: > > ... > o.a.coyote.http2.StreamStateMachine : Connection [124], Stream > [5,279], State changed from [OPEN] to [HALF_CLOSED_REMOTE] > o.a.coyote.http2.Http2UpgradeHandler : Connection [124], Stream > [5,279], Writing the headers > org.apache.coyote.http2.Stream : Connection [124], Stream > [5,279], flushing output with buffer at position [12], writeInProgress > [false] and closed [true] > org.apache.coyote.http2.AbstractStream : Connection [124], Stream > [5,279], reduce flow control window by [12] to [4194292] > o.a.coyote.http2.Http2UpgradeHandler : Connection [124], Stream > [5,279], Data length [12] > o.a.coyote.http2.StreamStateMachine : Connection [124], Stream > [5,279], State changed from [HALF_CLOSED_REMOTE] to [CLOSED_TX] > org.apache.coyote.http2.Stream : Connection [124], Stream > [5,279] has been recycled > > > From the logs it is not clear whether any of these flags is being sent to > the client. I will investigate! > > Note: ", Writing the headers" is added by me. The current log entry is > "upgradeHandler.writeHeaders=Connection [{0}], Stream [{1}]" > > >> >> > With more workers it starts failing with: >> > >> > "status_codes": { >> > "0": 1000, >> > "200": 1 >> > }, >> > "errors": [ >> > "Get \"https://localhost:18080/testbed/plaintext\": net/http: >> request >> > canceled while waiting for connection (Client.Timeout exceeded while >> > awaiting headers)", >> > "Get \"https://localhost:18080/testbed/plaintext\": context >> deadline >> > exceeded", >> > "Get \"https://localhost:18080/testbed/plaintext\": context >> deadline >> > exceeded (Client.Timeout exceeded while awaiting headers)" >> > ] >> > i.e. out of 1001 requests only one succeeds and the others fail with >> > timeout. I will try to debug this one too and run a profiler too. >> >> I'd probably start by enabling Tomcat's HTTP/2 debug logging for this >> one. I'm wondering if something is triggering a connection level error >> early in the processing. >> >> Mark >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org >> For additional commands, e-mail: users-h...@tomcat.apache.org >> >>