Hi,

Recently I hit a situation where HTTP requests with the “new” Java HttpClient 
ran into TCP-connection-resets, but sadly only on GitHub hosted runners and so 
far I had no luck to reproduce locally. This is why I wanted to setup a 
reproducer for that.

While trying to build that reproducer, I ran into another unrelated, but 
locally reproducible j.l.AssertionError in recent versions of Java 11 (Java 17 
and newer are fine). Reproducing the AE is pretty easy (although the reproducer 
is a bit big): clone https://github.com/snazy/http-client-tcp-reset-repro.git 
<https://github.com/snazy/http-client-tcp-reset-repro.git> + `./gradlew test` 
using Java 11. I’ve opened https://bugs.openjdk.org/browse/JDK-8294773 
<https://bugs.openjdk.org/browse/JDK-8294773> for this.

The original issue, for which I’m trying to build a reproducer for, is strange 
and hard to narrow down, because it only happens on GH hosted runners with the 
“new” Java HttpClient.

Background (for the TCP-conn-reset): We are using the plain old 
HttpURLConnection approach, and wanted to use the new HttpClient when it’s 
available (with Java 11+).

Requests with small payloads (in request and response) are fine - but 
bigger-ish (couple kb and more) payloads somewhat consistently run into 
TCP-connection-resets, which do not happen with HttpURLConnection and the 
Apache HTTP client. “Somewhat consistently” means: it happens often, but not 
always (kinda flaky).

The reproducer I was trying to build for that uses Jetty 9, Jetty 11 + the HTTP 
server in the JDK on the server side + HttpURLConnection, Apache HTTP client 
and the new Java HttpClient on the client side. The matrix of http 
server/client implementations is that big, because I wanted to narrow down 
whether there’s a specific combination that causes the TCP connection reset.

Some things I tried without success:
* Enable debug logging for the new HttpClient (no luck, debug logging changes 
the timing and the TCP conn reset doesn’t happen)
* Delay the request processing with a “Thread.sleep(10)” lets the issue 
disappear as well
* Using Java 17 + 18
* Using a different OpenJDK distribution (was Azul Zulu, tried Temurin)
* Using a local Docker container with CPU + memory somewhat similar to the GH 
hosted runner
* Using very similar IP settings (sysctl stuff for net.code + net.ipv4) locally

What I could figure out so far is that the new HttpClient creates a connection 
for mostly every HTTP request, it reuses existing connections locally. This 
happens even if there are HTTP request that were completely fine. (<— only on 
GH hosted runners).

I’ll continue to try to get a reproducer for the TCP-connection-reset issue.

Robert

Reply via email to