Hello.

We are having a random issue in a setup with Tomcat deployed behind Apache httpd. Some HTTP/2 requests proxied to Tomcat by httpd end with a 503 (service unavailable) response to the web browser client.

Tomcat version is 9.0.58 running with Java 1.8.0 in a Docker container.
Docker version is 18.09.2 and the host machine runs Ubuntu 18.04.

I only have access to httpd logs, because enabling Tomcat logs (org.apache.coyote.http2.level = FINE) cause the problem to go away.

The following is an excerpt of a failing HTTP/2 stream, requested by httpd to Tomcat.

[Thu Dec 15 21:21:25.202719 2022] [proxy_http2:debug] [pid 44:tid 139828875544320] h2_proxy_session.c(931): [remote 192.168.65.2:18095] AH03363: h2_proxy_session(219-1289): submit localhost/app/143.js -> 1 [Thu Dec 15 21:21:25.202784 2022] [proxy_http2:debug] [pid 44:tid 139828875544320] h2_proxy_session.c(345): [remote 192.168.65.2:18095] AH03343: h2_proxy_session(219-1289): sent FRAME[HEADERS[length=549, hend=1, stream=1, eos=0]] [Thu Dec 15 21:21:25.277438 2022] [proxy_http2:debug] [pid 44:tid 139828875544320] h2_proxy_session.c(267): [remote 192.168.65.2:18095] AH03341: h2_proxy_session(219-1289): recv FRAME[HEADERS[length=120, hend=1, stream=1, eos=0]] [Thu Dec 15 21:21:25.277441 2022] [proxy_http2:debug] [pid 44:tid 139828875544320] h2_proxy_session.c(267): [remote 192.168.65.2:18095] AH03341: h2_proxy_session(219-1289): recv FRAME[DATA[length=0, flags=1, stream=1, padlen=0]] [Thu Dec 15 21:21:25.277445 2022] [proxy_http2:debug] [pid 44:tid 139828875544320] h2_proxy_session.c(267): [remote 192.168.65.2:18095] AH03341: h2_proxy_session(219-1289): recv FRAME[RST_STREAM[length=4, flags=0, stream=1]] [Thu Dec 15 21:21:25.277448 2022] [proxy_http2:debug] [pid 44:tid 139828875544320] h2_proxy_session.c(560): [remote 192.168.65.2:18095] AH03360: h2_proxy_session(219-1289): stream=1, closed, err=8 [Thu Dec 15 21:21:25.277449 2022] [proxy_http2:debug] [pid 44:tid 139828875544320] h2_proxy_session.c(1352): [remote 192.168.65.2:18095] AH03364: h2_proxy_sesssion(219-1289): stream(1) closed (touched=0, error=8)

This is a similar excerpt from a successful HTTP/2 stream between the same servers.

[Wed Dec 14 15:31:19.516817 2022] [proxy_http2:debug] [pid 42:tid 140518005270272] h2_proxy_session.c(931): [remote 192.168.65.2:18095] AH03363: h2_proxy_session(212-197): submit localhost/app/1543.js -> 3 [Wed Dec 14 15:31:19.516849 2022] [proxy_http2:debug] [pid 42:tid 140518005270272] h2_proxy_session.c(345): [remote 192.168.65.2:18095] AH03343: h2_proxy_session(212-197): sent FRAME[HEADERS[length=59, hend=1, stream=3, eos=0]] [Wed Dec 14 15:31:19.569569 2022] [proxy_http2:debug] [pid 42:tid 140518005270272] h2_proxy_session.c(267): [remote 192.168.65.2:18095] AH03341: h2_proxy_session(212-197): recv FRAME[HEADERS[length=33, hend=1, stream=3, eos=0]] [Wed Dec 14 15:31:19.569575 2022] [proxy_http2:debug] [pid 42:tid 140518005270272] h2_proxy_session.c(267): [remote 192.168.65.2:18095] AH03341: h2_proxy_session(212-197): recv FRAME[DATA[length=0, flags=1, stream=3, padlen=0]] [Wed Dec 14 15:31:19.569576 2022] [proxy_http2:debug] [pid 42:tid 140518005270272] h2_proxy_session.c(560): [remote 192.168.65.2:18095] AH03360: h2_proxy_session(212-197): stream=3, closed, err=0 [Wed Dec 14 15:31:19.569577 2022] [proxy_http2:debug] [pid 42:tid 140518005270272] h2_proxy_session.c(1352): [remote 192.168.65.2:18095] AH03364: h2_proxy_sesssion(212-197): stream(3) closed (touched=0, error=0) [Wed Dec 14 15:31:19.569753 2022] [proxy_http2:debug] [pid 42:tid 140518005270272] h2_proxy_session.c(267): [remote 192.168.65.2:18095] AH03341: h2_proxy_session(212-197): recv FRAME[RST_STREAM[length=4, flags=0, stream=3]]

In both cases, the request is a GET with "If-Modified-Since" header that should result in a 304 response (not modified).
So, it is correct that there is no request nor response body.
The sequence of exchanged frames is

- httpd sends HEADERS frame,
- Tomcat sends HEADERS frame,
- Tomcat sends DATA frame with length 0 (correct) and with END_STREAM flag (1),
- Tomcat sends RST_STREAM frame.

I think the problem is somehow related to the RST_STREAM frame.
In the failing case, it is read by httpd before it declares the stream closed, and the frame contained the CANCEL error code (error=8, err=8). In the successful case, it is read by httpd after the stream is declared closed, so the log does not show its error code. Either the code is NO_ERROR (0) or it is still CANCEL (8) but it is too late for httpd to change the outcome of the stream, which it declared error=0.

The cases when the stream ends with a non-zero code on httpd side are precisely the ones that result in a 503 error for the client.

The problem looks very time-sensitive. I can only reproduce it with Tomcat HTTP/2 logs disabled, without a Java debugger attached to Tomcat, and with real network lag. I created an automatic test case that uses a browser to download 20 files concurrently and repeats every couple of seconds.

- With httpd and Tomcat on two different cloud machines, without Tomcat logs, the problem happens after maximum 30 seconds. - With httpd and Tomcat on my machine or Tomcat logs/debug enabled, I gave up reproducing the problem after 10 hours.

It also appears that the problem is sensitive to the response being a 304 with no body. If I disable caching, the problem becomes a lot harder to reproduce but still possible (I'd say about 20 times less probable).

My guess is that the RST_STREAM frame sent by Tomcat always contains the CANCEL code, and there is a race condition with httpd closing the stream. When the RST_STREAM arrives before closing, httpd interprets the CANCEL code as an error condition and reports that to the client with 503 error code.

I looked at the sources for the HTTP/2 connector and found this code around line 100 of org.apache.coyote.http2.StreamProcessor.

  StreamException se = new StreamException(
      sm.getString("streamProcessor.cancel", stream.getConnectionId(),
stream.getIdAsString()), Http2Error.CANCEL, stream.getIdAsInt());
  stream.close(se);

This seems to be the only place where the connector is sending a CANCEL besides the forced closing of the connection in org.apache.coyote.http2.Http2UpgradeHandler.close().

I don't know HTTP/2 deeply enough to say that sending CANCEL here is wrong, also because I'm not sure I understand the situation that the connector is addressing. However, could it be that the correct code here shold be Http2Error.NO_ERROR? I recompiled the class with that change, and my test case ran for 12 hours without fail.

There is also this bug report for the Go http2 package:
https://github.com/golang/go/issues/19948
It looks pretty similar to the problem we see here (except that, in that case, the client is directly a browser). The folks there conclude that the CANCEL error code is mostly intended for clients and that servers should end "useless" streams with NO_ERROR. They also quote a section of the RFC (https://www.rfc-editor.org/rfc/rfc7540#section-8.1), but I am not sure if it applies to the Tomcat case.

What do you think? Could this be a rare Tomcat bug, an httpd bug, or something else?
Please let me know if you need more information.

Thanks in advance for your help.
Regards.

--
Lorenzo


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to