On Thu, Feb 13, 2025 at 11:05 AM Rémy Maucherat <r...@apache.org> wrote:
> On Thu, Feb 13, 2025 at 9:41 AM Cenk Pekyaman <cenkpekya...@gmail.com> > wrote: > > > > We run tomcat on java17 with the embedded tomcat setup. > > We have http and https connectors and we have http2 upgradeProtocol for > > both. > > > > We recently upgraded from 9.0.88 to 10.1.24 to work on javax to jakarta > > migration, and after a while, upgraded to 10.1.33. > > After the upgrade, we started to see random and rare 500 errors for some > of > > the http2 GET requests over https. > > When reproducing the error on our development machines, we could see the > > following trace on the server: > > ``` > > java.io.IOException: null > > Tomcat now sets an IO exception to trigger ReadListener.onError with > an appropriate error in that case. So things seem normal so far. > > Now the "null" is not normal, the corresponding string exists in the > resource bundle: > stream.clientResetRequest=Client reset the stream before the request > was fully read > > Rémy > > > at > > > org.apache.coyote.http2.Stream$StandardStreamInputBuffer.receiveReset(Stream.java:1516) > > ~[tomcat-coyote-10.1.33.jar:10.1.33] > > at org.apache.coyote.http2.Stream.receiveReset(Stream.java:224) > > ~[tomcat-coyote-10.1.33.jar:10.1.33] > > at > > > org.apache.coyote.http2.Http2UpgradeHandler.close(Http2UpgradeHandler.java:1305) > > ~[tomcat-coyote-10.1.33.jar:10.1.33] > > at > > > org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch(Http2UpgradeHandler.java:437) > > ~[tomcat-coyote-10.1.33.jar:10.1.33] > > at > > > org.apache.coyote.http2.Http2AsyncUpgradeHandler.upgradeDispatch(Http2AsyncUpgradeHandler.java:43) > > ~[tomcat-coyote-10.1.33.jar:10.1.33] > > at > > > org.apache.coyote.http2.Http2AsyncParser$FrameCompletionHandler.failed(Http2AsyncParser.java:337) > > ~[tomcat-coyote-10.1.33.jar:10.1.33] > > at > > > org.apache.coyote.http2.Http2AsyncParser$FrameCompletionHandler.failed(Http2AsyncParser.java:167) > > ~[tomcat-coyote-10.1.33.jar:10.1.33] > > at > > org.apache.tomcat.util.net > .SocketWrapperBase$VectoredIOCompletionHandler.failed(SocketWrapperBase.java:1148) > > ~[tomcat-coyote-10.1.33.jar:10.1.33] > > at > > org.apache.tomcat.util.net > .NioEndpoint$NioSocketWrapper$NioOperationState.run(NioEndpoint.java:1660) > > ~[tomcat-coyote-10.1.33.jar:10.1.33] > > at > > > org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190) > > [tomcat-util-10.1.33.jar:10.1.33] > > at > > > org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) > > [tomcat-util-10.1.33.jar:10.1.33] > > at > > > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63) > > [tomcat-util-10.1.33.jar:10.1.33] > > at java.lang.Thread.run(Thread.java:840) [?:?] > > ``` > > > > after further testing, we think the change > > > https://github.com/apache/tomcat/commit/f902edf085c0c73139a66d1bfc4d5790a416b130 > > introduced in 10.1.29 is the reason we get 500 status. > > but the change seems to be doing what is intended, so we tested with > > multiple tomcat versions to see if there were already unexposed failures > > prior. > > and in versions like 10.1.24 we see the below error which does not result > > in 500 status: > > ``` > > [org.apache.coyote.http2.Http2Parser] {https-jsse-nio-8443-exec-6} > > Connection [92], Stream [0], Frame type [null], Error > > java.io.IOException: Unable to unwrap data, invalid status [CLOSED] > > at > > org.apache.tomcat.util.net > .SecureNioChannel.read(SecureNioChannel.java:772) > > ~[tomcat-coyote-10.1.24.jar:10.1.24] > > at > > org.apache.tomcat.util.net > .NioEndpoint$NioSocketWrapper$NioOperationState.run(NioEndpoint.java:1609) > > ~[tomcat-coyote-10.1.24.jar:10.1.24] > > at > > > org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190) > > [tomcat-util-10.1.24.jar:10.1.24] > > at > > > org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) > > [tomcat-util-10.1.24.jar:10.1.24] > > at > > > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63) > > [tomcat-util-10.1.24.jar:10.1.24] > > at java.base/java.lang.Thread.run(Thread.java:840) [?:?] > > ``` > > it is not the same error, but since it is happening around the same code > > context NioOperationState.run, it gave us the impression that the traces > > might be related. > > > > We have run our reproducer with multiple versions: 10.1.24, 10.1.28, > > 10.1.29, 10.1.33, 10.1.35, 11.0.2. > > versions 10.1.29 and up show the rarer "receiveReset" issue (500 status), > > versions 10.1.28 and down showcase only "Unable to unwrap data" issue (no > > 500). > > > > We can reproduce the issue somewhat reliably with the reproducer setup we > > have in https://github.com/cpekyaman/java-server-http2-test. > > The odd part is we could only reproduce it with certain clients reliably, > > with go and curl (we include go client as the failing requests in our > > production systems come from a monitoring system written in go). > > Some others, like the JDK client itself, either can't trigger the issue, > or > > require much longer test time (couldn't verify this). > > The linked reproducer repo contains information to try it if you need. > and > > it also contains some trace level logs from some of the runs we already > did > > in the trace_logs_history folder. > > > > It seems something is off, but we are not sure if tomcat is doing > something > > unexpected or not. > > Thanks. > > --------------------------------------------------------------------- > Sorry, I just realized that I might have misread your last message and want to double check. When you said things look normal, did you mean you couldn't reproduce the issue at all ? or, you could see the issue occurring, but it is normal due to what the client is doing ? Thanks.