[ https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652386#comment-14652386 ]
Onur Karaman commented on KAFKA-2397: ------------------------------------- Hey everyone. There's a difference between the best, expected, and worst case rebalance time. Trunk ----- A consumer leaves at t = 0 and the coordinator detects the failure at t = s. The rebalance window can close as soon as all the existing consumers rejoin and as late as the maximum member session timeout. The time to stabilize since the consumer failure is something like: {code} t = s + rebalance_timeout {code} Best case: The coordinator receives all of the remaining consumers' heartbeats immediately after t = s. All of the remaining consumers rejoin immediately after receiving the heartbeat response. So everything is done by *t ~= s*. Expected case: The coordinator receives all of the remaining heartbeats at t = 4s/3 because consumers will typically figure out the rebalance after s/3 (an oversimplification. Consumers of a group actually have staggered heartbeat intervals). All of the remaining consumers eventually rejoin (coordinator_join_group_request_receival_delay). So everything is done by *t ~= s + (s/3 + coordinator_join_group_request_receival_delay)*. Worst case: All of the consumers in the group somehow fail to get notified of the rebalance until very last possible moment and rejoin the group just before the rebalance window ends: *t = s + s*. LeaveGroupRequest ----- A consumer leaves at t = 0 and sends out the LeaveGroupRequest. The rebalance window can close as soon as all the existing consumers rejoin and as late as the maximum member session timeout. The LeaveGroupRequest would cut down the time to stabilize since the consumer failure to something like: {code} t = coordinator_leave_group_request_receival_delay + rebalance_timeout {code} Best case: A consumer leaves at t = 0, sends out the LeaveGroupRequest, and the coordinator immediately receives the LeaveGroupRequest. The coordinator receives all of the remaining consumers' heartbeats immediately after t = 0. All of the remaining consumers rejoin immediately after receiving the heartbeat response. So everything is done by *t ~= 0*. Expected case: A consumer leaves at t = 0, sends out the LeaveGroupRequest, and the coordinator receives the LeaveGroupRequest at t = coordinator_leave_group_request_receival_delay. All of the remaining consumers eventually rejoin (coordinator_join_group_request_receival_delay). So everything is done by *t ~= coordinator_leave_group_request_receival_delay + (s/3 + coordinator_join_group_request_receival_delay)*. I'm assuming coordinator_leave_group_request_receival_delay << s. Worst case: A consumer leaves at t = 0, sends out the LeaveGroupRequest, and the coordinator receives the LeaveGroupRequest at t = coordinator_leave_group_request_receival_delay. All of the consumers in the group somehow fail to get notified of the rebalance until very last possible moment and rejoin the group just before the rebalance window ends: *t = coordinator_leave_group_request_receival_delay + s*. I'm assuming coordinator_leave_group_request_receival_delay << s. Absolute worst case: The LeaveGroupRequest somehow got dropped before reaching the coordinator. The heartbeat would timeout on the coordinator anyway and hit the existing *t = s + s* behavior. Summary ----- So I guess the absolute worst case behavior hasn't changed if the LeaveGroupRequest was somehow dropped, but everything else should get better by about s. P.S: To avoid confusion, it's probably best to state whether you're talking about the behavior in trunk or the proposed behavior with LeaveGroupRequest. I prefer having a separate LeaveGroupRequest, but that's less of the focus here. > leave group request > ------------------- > > Key: KAFKA-2397 > URL: https://issues.apache.org/jira/browse/KAFKA-2397 > Project: Kafka > Issue Type: Sub-task > Components: consumer > Reporter: Onur Karaman > Assignee: Onur Karaman > Priority: Minor > Fix For: 0.8.3 > > > Let's say every consumer in a group has session timeout s. Currently, if a > consumer leaves the group, the worst case time to stabilize the group is 2s > (s to detect the consumer failure + s for the rebalance window). If a > consumer instead can declare they are leaving the group, the worst case time > to stabilize the group would just be the s associated with the rebalance > window. > This is a low priority optimization! -- This message was sent by Atlassian JIRA (v6.3.4#6332)