Hi,

In our setup, we have a Kubernetes pod with two containers—one hosting the 
gRPC service and the other acting as the client. The service is implemented 
in Golang, while the client is in Python. We create the client at the start 
of the application and use the same client though out the lifetime of the 
application. 

Everything works fine until the server sends a keepalive ping (default: 2 
hours after the last activity) and it doesn’t receive an acknowledgment 
within 20 seconds and it closes the transport.
When the client subsequently makes a gRPC call, it detects that the 
transport is unavailable and encounters an error. However, on the next 
attempt, knowing the transport is missing, it creates a new one, and the 
call succeeds.

Since both the server and client run within the same Kubernetes pod, there 
is no firewall blocking the pings. They are on the same network and 
communicate via localhost.

I enabled debug logging and noticed that pings sent by the client are 
acknowledged by the server. These pings have an 8-byte payload containing 
arbitrary data. I assume these are not keepalive pings, as keepalive pings 
should contain all zeros in hex. (Please correct me if I’m wrong.)

>From my debug log observations, I see the following pattern:

   1. The client creates a channel and an underlying transport, ready to 
   accept gRPC calls.
   2. The client initiates a gRPC call and it succeeds
   3. After 30 minutes of inactivity, the channel transitions from *Ready*
    to *Idle*, and the transport transitions from *Ready*to *Shutdown*. 
   (This behavior is inconsistent; sometimes, these transitions appear in 
   logs, and other times they don’t. It could be related 
   to GRPC_VERBOSITY throttling the logs.)
   4. When step (3) occurs, I see logs on the server indicating that the 
   transport is shutting down (Closing: EOF).
   5. After 2 hours of inactivity, the server sends a keepalive ping, 
   doesn’t receive a response within 20 seconds, and closes the transport. At 
   this point, the client logs show no indication that the transport has been 
   closed.
   6. During step (5), when I check active TCP connections, I 
   see:localhost:51902 -> localhost:50251 (CLOSE_WAIT) Here, 50251 is the 
   server, and 51902 is the client
   7. When a new grpc call is initiated, it fails as the transport is 
   closed.

I have not overridden any gRPC channel options—these observations are based 
on the default configuration.

Can anyone help me debug and understand this issue?

debug logs from *client* after 30 minutes of inactivity

I0000 00:00:1740647883.533817 23 init.cc:167] grpc_shutdown(void) I0000 
00:00:1740648903.774618 26 connectivity_state.cc:173] 
ConnectivityStateTracker client_channel[0x555dd05705d0]: get current state: 
READY I0000 00:00:1740648903.774675 26 connectivity_state.cc:151] 
ConnectivityStateTracker client_channel[0x555dd05705d0]: READY -> IDLE 
(channel entering IDLE, OK) I0000 00:00:1740648903.774714 26 
connectivity_state.cc:151] ConnectivityStateTracker 
client_transport[0x7f0cd0001968]: READY -> SHUTDOWN (close_transport, OK) I0000 
00:00:1740648903.774719 26 connectivity_state.cc:159] 
ConnectivityStateTracker client_transport[0x7f0cd0001968]: notifying 
watcher 0x7f0cd0001500: READY -> SHUTDOWN I0000 00:00:1740648903.774779 26 
connectivity_state.cc:74] watcher 0x7f0cd0001500: delivering async 
notification for SHUTDOWN (OK) I0000 00:00:1740648903.774788 26 
init.cc:167] grpc_shutdown(void)

debug logs from *server* after 30 minutes of inactivity

02/26 08:01:55 INFO: [transport] [server-transport 0xc000338000] Closing: 
EOF 2025/02/26 08:01:55 INFO: [transport] [server-transport 0xc000338000] 
loopyWriter exiting with error: transport closed by client

debug logs from *server* after 2 hours of inactivity

2025/02/26 15:27:44 INFO: [transport] [server-transport 0x14000214600] 
Closing: keepalive ping not acked within timeout 20s 2025/02/26 15:27:44 
INFO: [transport] [server-transport 0x14000214600] loopyWriter exiting with 
error: transport closed by client

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/grpc-io/2a8b8766-2464-456f-a422-c6506d5e548dn%40googlegroups.com.

Reply via email to