lianetm commented on code in PR #16017:
URL: https://github.com/apache/kafka/pull/16017#discussion_r1612039771


##########
clients/src/main/java/org/apache/kafka/clients/consumer/internals/HeartbeatRequestManager.java:
##########
@@ -208,7 +210,7 @@ public NetworkClientDelegate.PollResult poll(long 
currentTimeMs) {
             return new 
NetworkClientDelegate.PollResult(heartbeatRequestState.heartbeatIntervalMs, 
Collections.singletonList(leaveHeartbeat));
         }
 
-        boolean heartbeatNow = membershipManager.shouldHeartbeatNow() && 
!heartbeatRequestState.requestInFlight();
+        boolean heartbeatNow = membershipManager.shouldHeartbeatNow() && 
(!heartbeatRequestState.requestInFlight() || 
membershipManager.isLeavingGroup());

Review Comment:
   I'm afraid this addition has an undesired side-effect: members with 
rebalance callbacks will be storming the broker with heartbeat requests without 
respecting the HB interval (ex. while `onPartitionsRevoked` runs to release 
assignment before leaving). The issue is that `isLeavingGroup` will be true 
while the member is in PREPARE_LEAVING (executing the rebalance callbacks to 
revoke the owned partitions, which runs on the user space out of our control). 
While the callbacks execute (might be long), we want to keep the member in the 
group, so we keep sending heartbeats, on the interval.
   
   That being said, I wonder if this small twist would be enough:
   ```suggestion
           boolean heartbeatNow = membershipManager.state() == 
MemberState.LEAVING ||
               membershipManager.shouldHeartbeatNow() && 
!heartbeatRequestState.requestInFlight();
   ```
   
   At the end of the day, when LEAVING (not preparing to leave), we shouldn't 
have to wait for an inflight because:
   1. the leave intention is not affected in any way by the response that may 
come for the inflight request (ex. getting fenced, new assignments, all will be 
ignored if member is LEAVING)
   2. the HB to leave is only sent once, so no risk of storming the broker with 
requests sent without waiting for inflight responses. 
   
   I'm still thinking in case I'm missing something, but saying that we don't 
wait for inflight responses to send a leave group seems conceptually right, and 
solves the issue we had since that was truly the root cause. 
   
   Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to