Hello Alan,
On 21/08/21 9:56 pm, Alan Bateman wrote:
On 21/08/2021 12:40, Jaikiran Pai wrote:
I was able to reproduce this on a MacOS. However, the continuous
integration setup project for Quarkus projects runs these tests
against Linux and Windows setups and they have run into this issue at
least on the Linux OS jobs (I will need to go and check if Windows
jobs had failed too). I can get the specific OS versions if
necessary, but I don't think that will be needed (due to the
reproducer I explain below).
:
From what I see in the output of this program, the resolution of
microprofile.io returns 4 IP addresses. 2 of them are of type IPv4
and 2 are of type IPv6. Across all Java versions, for IPv4 addresses,
the connection attempts fail with the same
"java.net.SocketTimeoutException: Connect timed out". However, for
the IPv6 addresses, in Java 11, the connection attempts fail with
"java.net.ConnectException: No route to host (connect failed)"
whereas in Java 16, 17 and upstream latest, the connection attempts
against the IPv6 addresses fails with
"java.net.NoRouteToHostException: No route to host".
Thanks for the additional information, I think I understand the issue
now.
If you extend your test to include a connect without a timeout then
you'll see that old and new implementations throw
NoRouteToHostException when the underlying error is EHOSTUNREACH "No
route to host".
You are right - I tweaked my example to remove the timeout param being
passed to the connect method and with that change, the exception that
gets thrown is consistent (NoRouteToHostException) across Java version
for IPv6 addresses.
However, for the connect with timeout case on Linux/macOS/Unix the old
implementation doesn't correctly handle network errors when they are
reported immediately. It throws ConnectException for all errors,
including EHOSTUNREACH "No route to host", whereas it should map the
error to a specific exception as it does for the untimed case. It's
possible that this bug has existed for a long time.
So while there is indeed a behavior change between the old and new
implementation for the timed case where the connect fails immediately,
I don't think we should attempt to change the new implementation to
have this buggy behavior.
Thank you for that explanation and yes, what you say makes sense.
Do you connections to the Apache HTTP client library and the retry
code that is looking for specific exceptions? From a distance it seems
very fragile and depending on very implementation specific behavior. I
wonder if it has ever been tested on Windows or with an untimed connect.
I am not involved in the Apache HTTP client library project. However, I
will go ahead and open a discussion in their mailing list and bring this
issue to their attention, so that they can decide how to deal with it.
Thank you for your help and the explanation.
-Jaikiran