[ 
https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729769#comment-14729769
 ] 

Jiangjie Qin commented on KAFKA-2397:
-------------------------------------

[~ewencp] [~hachikuji] Some thoughts on this. I agree with [~ewencp] that we 
should follow one protocol but not both. Personally I like explicit leave group 
request better.

The goals we want to achieve are:
1. When a consumer actually dies, we don't want to wait for too long before a 
rebalance.
2. When a consumer exits normally, we want to trigger a rebalance soon.
3. If there are some jitters or network issues, etc. We want to have some 
tolerance over that.

Using TCP connection to signify the liveliness will satisfy 2. 
For 1, if the TCP connection timeout is super long it won't work. That's why we 
introduced session timeout. 
For 3, using TCP connection to signify liveliness might cause problem.

Explicit leave group request is clear that a member will only be excluded from 
a group when it exit normally, or session is timeout. So all the three goals 
are met.

An important related scenario worth thinking about is bouncing a consumer. 
Without leave group request, it is possible to bounce a client without 
triggering rebalance as long as the consumer shuts down then come back before 
session timeout. If we send a leave group request explicitly, bouncing a 
consumer means there will be two rebalances (Which I think is the correct 
behavior). So making rebalance cheap and fast is very important.

> leave group request
> -------------------
>
>                 Key: KAFKA-2397
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2397
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>            Reporter: Onur Karaman
>            Assignee: Onur Karaman
>            Priority: Minor
>             Fix For: 0.8.3
>
>
> Let's say every consumer in a group has session timeout s. Currently, if a 
> consumer leaves the group, the worst case time to stabilize the group is 2s 
> (s to detect the consumer failure + s for the rebalance window). If a 
> consumer instead can declare they are leaving the group, the worst case time 
> to stabilize the group would just be the s associated with the rebalance 
> window.
> This is a low priority optimization!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to