lianetm commented on code in PR #16017: URL: https://github.com/apache/kafka/pull/16017#discussion_r1612039771
########## clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java: ########## @@ -208,7 +210,7 @@ public NetworkClientDelegate.PollResult poll(long currentTimeMs) { return new NetworkClientDelegate.PollResult(heartbeatRequestState.heartbeatIntervalMs, Collections.singletonList(leaveHeartbeat)); } - boolean heartbeatNow = membershipManager.shouldHeartbeatNow() && !heartbeatRequestState.requestInFlight(); + boolean heartbeatNow = membershipManager.shouldHeartbeatNow() && (!heartbeatRequestState.requestInFlight() || membershipManager.isLeavingGroup()); Review Comment: I'm afraid this addition has an undesired side-effect: members with rebalance callbacks will be storming the broker with heartbeat requests without respecting the HB interval (ex. while `onPartitionsRevoked` runs to release assignment before leaving). The issue is that `isLeavingGroup` will be true while the member is in PREPARE_LEAVING (executing the rebalance callbacks to revoke the owned partitions, which runs on the user space out of our control). While the callbacks execute (might be long), we want to keep the member in the group, so we keep sending heartbeats, on the interval. That being said, I wonder if this small twist would be enough: ```suggestion boolean heartbeatNow = membershipManager.state() == MemberState.LEAVING || membershipManager.shouldHeartbeatNow() && !heartbeatRequestState.requestInFlight(); ``` At the end of the day, when LEAVING (not preparing to leave), we shouldn't have to wait for an inflight because: 1. the leave intention is not affected in any way by the response that may come for the inflight request (ex. getting fenced, new assignments, all will be ignored if member is LEAVING) 2. the HB to leave is only sent once, so no risk of storming the broker with requests sent without waiting for inflight responses. I'm still thinking in case I'm missing something, but saying that we don't wait for inflight responses to send a leave group seems conceptually right, and solves the issue we had since that was truly the root cause. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org