Over the last few months, I've been investigating this problem in detail. Your analysis is basically correct: when the network drops the TCP connection, it often does so silently. (This is the documented default behavior for Azure load balancers, for instance; see [1].) The client staleness checks don't help, because they work by attempting to read a FIN (or RST) from the socket. In the case of a silent network drop, you don't know the connection has been dropped until you try to reuse it, at which point the network sends a TCP reset.
To make matters worse, there is no standard way for the server to "specify a keep-alive timeout in its response." No RFC defines any valid parameters for the `Keep-Alive` header, and I know from experience that many servers send an *incorrect* value that should be ignored. Additionally, connection-level HTTP headers are categorically deprecated in HTTP/2. I've made a few changes to the client to improve this situation: 1. Support for TCP keep-alive tuning has been introduced. Currently, you can enable TCP keep-alive, but it's useless: the default behavior is typically to start sending TCP keep-alive segments only after TWO HOURS of idleness. There are socket options that let you configure this, but historically they were not exposed through any Java API; this has all been fixed now. In HttpCore 5.4 (HttpClient 5.6), the client will use a five-second TCP keep-alive by default. This should help prevent idle connections from being silently dropped by the network. 2. You can now configure an idle timeout through `ConnectionConfig`. This is preferable to solutions like `connectionTimeToLive`, because a connection TTL will close ALL connections of a certain age, whether they are idle or not. 3. Staleness checking (associated with the `validateAfterInactivity` option) has been made more robust to race conditions within the client (particularly the async client). However, staleness checks should not be relied upon, as it is always possible that the client will decide to reuse a connection while the server's FIN is still traveling over the network. The _only_ reliable protection against stale connection reuse is to set a client-side idle timeout lower than the server's. I'd recommend upgrading to HttpClient 5.6 once it comes out (you can use the alpha release for now) and setting an idle timeout of perhaps 210 seconds. It probably wouldn't hurt to also `setValidateAfterInactivity` to something in the 20-60 second range. But I think the main thing in your case is actually the new TCP keep-alive behavior, which by itself might fix the silent connection drops you're seeing. [1] https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-tcp-reset On Thu, Dec 4, 2025 at 6:27 AM Matthias Eichel <[email protected]> wrote: > Hi all, > > our application is hosted on a server in the Azure cloud and it uses > Apache HTTPClient (currently version 4.5.x, but that makes no difference > for problem) to communicate to an external web service that does not > specify a keep-alive timeout in it's response, so the timeout it is > interpreted as "indefinitely" > > Unfortunately Azure (outbound load balancer? SNAT? ...?) drops the > connection, maybe because it is idle, after about 4 minutes. > > The isStale() method in BHttpConnectionBase performs a check of trying > to read from the connection after setting the socket timeout to one > millisecond. As far as I see, it is expected that this will trigger a > SocketTimeoutException in the case that the connection is not stale and > the socket timeout of one millisecond makes sure that this happens as > fast as possible. So a catched SocketTimeoutException is seen as proof > that the connection is not stale and isStale() returns false in that case. > > However in the light of connections that were dropped by Azure or the > like this check of trying to read and interprete > a SocketTimeoutException obviously does not seem to work as any attempt > to communicate to the external web service over such a connection (which > is not stale according to isStale()) will lead to errors, in our case > also to a SocketTimeoutException > > Of course, our problem is easily solved by using a custom > ConnectionKeepAliveStrategy that limits the keep-alive timeout to > significantly under 4 minutes but it would be more satisfactory if there > were a solution, that can determine if a connection is stale or not, > even when the connection is, as in our case, dropped by some firewall, > load balancer, ... > > Do you have any idea if this is possible? > > Thank you in advance for your answer > > Regards > Matthias > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
