TCP_USER_TIMEOUT hard-closes the connection on too much unresponsiveness,
aborting all in-progress requests. This makes it a dance between setting
too short a timeout, and aborting valid requests, or too long a timeout,
and using a dead connection. You allude to this when you say that
TCP_USER_TIME
> A common connection failure mode is for a server to become entirely
> unresponsive
This should be caught by TCP_USER_TIMEOUT. If you enable gRPC keep-alive,
then normally TCP_USER_TIMEOUT will be enabled to the value of
keepAliveTimeout (at least in Java/GO AFAIK). Then you can set
keepAliveTime
Even one minute is really too long.
A common connection failure mode is for a server to become entirely
unresponsive, due to a backend restarting or load balancing shifting
traffic off a cluster entirely. For HTTP/1 traffic, this results in a
single failed request on a connection. Abandoning an HT
On Mon, Dec 2, 2024 at 2:19 PM 'Damien Neil' via grpc.io <
grpc-io@googlegroups.com> wrote:
> I learned of this in https://go.dev/issue/70575, which is an issue filed
> against Go's HTTP/2 client, caused by a new health check we'd added: When a
> request times out or is canceled, we send a RST_STR
I forget the exact history here, but I'll note that a design whereby
receiving HEADERS or DATA resets the counter allows the client to control
resource usage - and that (at least in C++) the ping response path forces
writes to be scheduled immediately, meaning that a repeated HEADERS/PING
pattern b
(I'd file this as an issue, but so far as I can tell this spans all gRPC
implementations and I can't figure out which GitHub tracker to use in that
case.)
gRPC servers set a fairly aggressive limit on the number of pings clients
can send. The algorithm is detailed here:
https://github.com/grpc/