[grpc-io] Keepalive pings from server are not acknowledged by the client

sudo kpr Fri, 28 Feb 2025 08:58:28 -0800

Hi,

In our setup, we have a Kubernetes pod with two containers—one hosting the
gRPC service and the other acting as the client. The service is implemented
in Golang, while the client is in Python. We create the client at the start
of the application and use the same client though out the lifetime of the
application.

Everything works fine until the server sends a keepalive ping (default: 2
hours after the last activity) and it doesn’t receive an acknowledgment
within 20 seconds and it closes the transport.
When the client subsequently makes a gRPC call, it detects that the
transport is unavailable and encounters an error. However, on the next
attempt, knowing the transport is missing, it creates a new one, and the
call succeeds.

Since both the server and client run within the same Kubernetes pod, there
is no firewall blocking the pings. They are on the same network and
communicate via localhost.

I enabled debug logging and noticed that pings sent by the client are
acknowledged by the server. These pings have an 8-byte payload containing
arbitrary data. I assume these are not keepalive pings, as keepalive pings
should contain all zeros in hex. (Please correct me if I’m wrong.)

>From my debug log observations, I see the following pattern:

1. The client creates a channel and an underlying transport, ready to
accept gRPC calls.
2. The client initiates a gRPC call and it succeeds
3. After 30 minutes of inactivity, the channel transitions from *Ready*
to *Idle*, and the transport transitions from *Ready*to *Shutdown*.
(This behavior is inconsistent; sometimes, these transitions appear in
logs, and other times they don’t. It could be related
to GRPC_VERBOSITY throttling the logs.)
4. When step (3) occurs, I see logs on the server indicating that the
transport is shutting down (Closing: EOF).
5. After 2 hours of inactivity, the server sends a keepalive ping,
doesn’t receive a response within 20 seconds, and closes the transport. At
this point, the client logs show no indication that the transport has been
closed.
6. During step (5), when I check active TCP connections, I
see:localhost:51902 -> localhost:50251 (CLOSE_WAIT) Here, 50251 is the
server, and 51902 is the client
7. When a new grpc call is initiated, it fails as the transport is
closed.

I have not overridden any gRPC channel options—these observations are based
on the default configuration.

Can anyone help me debug and understand this issue?

debug logs from *client* after 30 minutes of inactivity

I0000 00:00:1740647883.533817 23 init.cc:167] grpc_shutdown(void) I0000
00:00:1740648903.774618 26 connectivity_state.cc:173]
ConnectivityStateTracker client_channel[0x555dd05705d0]: get current state:
READY I0000 00:00:1740648903.774675 26 connectivity_state.cc:151]
ConnectivityStateTracker client_channel[0x555dd05705d0]: READY -> IDLE
(channel entering IDLE, OK) I0000 00:00:1740648903.774714 26
connectivity_state.cc:151] ConnectivityStateTracker
client_transport[0x7f0cd0001968]: READY -> SHUTDOWN (close_transport, OK) I0000
00:00:1740648903.774719 26 connectivity_state.cc:159]
ConnectivityStateTracker client_transport[0x7f0cd0001968]: notifying
watcher 0x7f0cd0001500: READY -> SHUTDOWN I0000 00:00:1740648903.774779 26
connectivity_state.cc:74] watcher 0x7f0cd0001500: delivering async
notification for SHUTDOWN (OK) I0000 00:00:1740648903.774788 26
init.cc:167] grpc_shutdown(void)

debug logs from *server* after 30 minutes of inactivity

02/26 08:01:55 INFO: [transport] [server-transport 0xc000338000] Closing:
EOF 2025/02/26 08:01:55 INFO: [transport] [server-transport 0xc000338000]
loopyWriter exiting with error: transport closed by client

debug logs from *server* after 2 hours of inactivity

2025/02/26 15:27:44 INFO: [transport] [server-transport 0x14000214600]
Closing: keepalive ping not acked within timeout 20s 2025/02/26 15:27:44
INFO: [transport] [server-transport 0x14000214600] loopyWriter exiting with
error: transport closed by client

--
You received this message because you are subscribed to the Google Groups
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion visit
https://groups.google.com/d/msgid/grpc-io/2a8b8766-2464-456f-a422-c6506d5e548dn%40googlegroups.com.

[grpc-io] Keepalive pings from server are not acknowledged by the client

Reply via email to