Is the /ra.hub.grpc.api.v1.SessionService/stopSession the unary call in question? >From the log it looks completely normal and successfully processed. It would be nice to have debug logs from C _and_ B at the time when the unary call is received on B until it is canceled.
re: keepAliveWithoutCalls -- the streaming calls you have between B-C are the in-flight calls, so you should always have some calls, that's why changing keepAliveWithoutCalls does not make a difference in your case. What is your client LB policy? On Tue, Dec 3, 2024 at 1:38 AM Jiří Krutil <krutilj...@gmail.com> wrote: > A quick update: turns out enabling *keepAliveWithoutCalls* for component > B does not resolve the issue. > > I would like to understand what kind of guarantees gRPC actually provides > here. Is RPC call delivery on a channel on best effort basis? > If that is the case, sounds like the only way to resolve this is to add > application-level health checking? > > Cheers > Jiri > > Dne středa 27. listopadu 2024 v 13:44:05 UTC+1 uživatel Jiří Krutil napsal: > >> I cannot confirm the exact number of open streaming calls on C during the >> last occurrence, but my assumption is just a few. >> >> I have tried processing one of the unary calls now with debug logging >> enabled on C and this shows: >> >> 2024-11-27 13:17:37.088 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] OUTBOUND SETTINGS: >> ack=false settings={MAX_CONCURRENT_STREAMS=2147483647 <(214)%20748-3647>, >> INITIAL_WINDOW_SIZE=1048576, MAX_HEADER_LIST_SIZE=8192} >> 2024-11-27 13:17:37.089 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] OUTBOUND WINDOW_UPDATE: >> streamId=0 windowSizeIncrement=983041 >> 2024-11-27 13:17:37.090 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] INBOUND SETTINGS: ack=false >> settings={ENABLE_PUSH=0, MAX_CONCURRENT_STREAMS=0, >> INITIAL_WINDOW_SIZE=1048576, MAX_HEADER_LIST_SIZE=8192} >> 2024-11-27 13:17:37.090 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] OUTBOUND SETTINGS: ack=true >> 2024-11-27 13:17:37.090 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] INBOUND WINDOW_UPDATE: >> streamId=0 windowSizeIncrement=983041 >> 2024-11-27 13:17:37.091 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] INBOUND SETTINGS: ack=true >> 2024-11-27 13:17:37.092 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] INBOUND HEADERS: streamId=3 >> headers=GrpcHttp2RequestHeaders[:path: >> /ra.hub.grpc.api.v1.SessionService/stopSession, :authority: 10.40.80.78, >> :method: POST, :scheme: http, te: trailers, content-type: application/grpc, >> user-agent: grpc-java-netty/1.66.0, grpc-accept-encoding: gzip, >> grpc-timeout: 59969004u] streamDependency=0 weight=16 exclusive=false >> padding=0 endStream=false >> 2024-11-27 13:17:37.110 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] INBOUND DATA: streamId=3 >> padding=0 endStream=true length=63 >> bytes=000000003a0a2435373239663665342d396633662d343738382d383132372d32343633636338663339666412120a0a6875622d6575732d3161120455532d31 >> 2024-11-27 13:17:37.110 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] OUTBOUND PING: ack=false >> bytes=1234 >> 2024-11-27 13:17:37.111 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] INBOUND PING: ack=true >> bytes=1234 >> 2024-11-27 13:17:37.112 [INFO] (grpc-default-executor-3) >> org.dett.router.grpc.RouterSessionGrpcService: - Processing stopSession >> request {uuid: "5729f6e4-9f3f-4788-8127-2463cc8f39fd" hubId { name: >> "hub-eus-1a" cluster: "US-1" }} >> 2024-11-27 13:17:37.115 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] OUTBOUND HEADERS: >> streamId=3 headers=GrpcHttp2OutboundHeaders[:status: 200, content-type: >> application/grpc, grpc-encoding: identity, grpc-accept-encoding: gzip] >> padding=0 endStream=false >> 2024-11-27 13:17:37.115 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] OUTBOUND DATA: streamId=3 >> padding=0 endStream=false length=5 bytes=0000000000 >> 2024-11-27 13:17:37.116 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] OUTBOUND HEADERS: >> streamId=3 headers=GrpcHttp2OutboundHeaders[grpc-status: 0] padding=0 >> endStream=true >> 2024-11-27 13:17:37.118 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] INBOUND PING: ack=false >> bytes=1234 >> 2024-11-27 13:17:37.118 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] OUTBOUND PING: ack=true >> bytes=1234 >> 2024-11-27 13:17:57.118 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] INBOUND PING: ack=false >> bytes=1111 >> 2024-11-27 13:17:57.118 [DEBUG] (grpc-default-worker-ELG-3-2) >> io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler: - [id: 0xab7eb2ae, >> L:/10.40.80.78:10010 - R:/10.40.80.77:49754] OUTBOUND PING: ack=true >> bytes=1111 >> >> I think this indicates *max_concurrent_streams* for C as a server is >> unlimited. >> >> Maybe I have oversimplified in the original post. We actually have 2 >> instances of A, 2 instances of B and N instances of C. >> One observation was that at one time, we had stale channel B1>C2 and >> stale channel B2>C4, while all other channels were working. >> In other words, the problem does not affect all channels from given B >> instance or all channels to given C instance. It seems to be channel >> specific. >> >> Just to check one related idea I had -- is it possible that enabling >> *keepAliveWithoutCalls* on B could make this issue go away? The >> reasoning is that if gRPC client on B sees incoming traffic from C, it may >> never probe the channel to C, thus won't detect the issue without this >> option. >> Dne úterý 26. listopadu 2024 v 20:53:01 UTC+1 uživatel Yuri Golobokov >> napsal: >> >>> How many server streaming calls are open to C at the time of the issue? >>> Does C limits the number of maximum concurrent streams? (you can see it >>> in the debug logs for the SETTINGS frame coming from C) >>> On Tuesday, November 26, 2024 at 7:22:16 AM UTC-8 Jiří Krutil wrote: >>> >>>> When running our Java application in MS Azure, we sometimes observe >>>> very strange behavior, which appears as if a long-lived gRPC channel was >>>> working only in one direction and was not delivering any RPC calls in the >>>> opposite direction. >>>> >>>> Out setup is that we have three apps connected via gRPC: >>>> >>>> A -> B -> C >>>> >>>> B usually has a long-lived server streaming gRPC requests to C open, >>>> consuming updates from C. When the issue occurs, updates from C are still >>>> streaming to B, but no new unary requests made by B make it to C. >>>> >>>> The unary requests made by B are originated by A. B receives the >>>> request from A and sends an unary request to C with a deadline copied from >>>> the original request. After 20 seconds, B sees an "RPC cancelled" event, >>>> which I believe comes from A in response to some kind of timeout. >>>> >>>> The problem occurs randomly and when it occurs, the channel never >>>> recovers. >>>> >>>> Debug logging seems to show that when B receives the request from A, it >>>> creates a new stub using an existing (cached) channel and attempts to send >>>> a request to C, but that request is actually never sent. >>>> >>>> If I make B forget the cached channel and create a new one, the unary >>>> request to C works fine. >>>> >>>> We have keepAlive enabled on these channels, so I am surprised that >>>> potential issues with the underlying connection are not detected by the >>>> keepAlive mechanism. Is it possible that since traffic is steadily flowing >>>> in the direction from C to B, that B never pings C to see if communication >>>> in the opposite direction works as well? >>>> >>>> I suppose we could work around this by adding application-level health >>>> checking for every channel, but I thought this is already taken care of by >>>> gRPC. >>>> >>>> Any suggestions would be appreciated. >>>> >>> -- > You received this message because you are subscribed to the Google Groups " > grpc.io" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to grpc-io+unsubscr...@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/grpc-io/9a7cf13f-6545-429f-b68b-f880fb967ccbn%40googlegroups.com > <https://groups.google.com/d/msgid/grpc-io/9a7cf13f-6545-429f-b68b-f880fb967ccbn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "grpc.io" group. To unsubscribe from this group and stop receiving emails from it, send an email to grpc-io+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/grpc-io/CAA0z_RHeG2bk%2B360Mbvs%3DxRMnT_u%3D1XwjNHmGXpF0FCOmHS3dw%40mail.gmail.com.