[jira] [Commented] (KAFKA-2397) leave group request

Jiangjie Qin (JIRA) Fri, 04 Sep 2015 13:34:31 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731373#comment-14731373
 ]


Jiangjie Qin commented on KAFKA-2397:
-------------------------------------

[~jkreps] using TCP close to signal disconnect does have merits. It works 
either when client process crashes or closes normally. It is just not very 
clear to me whether it is worth doing here.

The price we pay here is we have to propagate every connection close at network 
to coordinator. From the server log in LinkedIn I saw, socket closure is quite 
frequent. Todd even submitted a patch to change that particular log to debug 
level. They could just be the ad-hoc SyncProducer in old consumer to refresh 
metadata. Maybe I'm over concerned but I am a bit worried about the noise here.

I don't know in which case a TCP connection might be closed. Proxy was 
mentioned earlier, maybe some workload balancer / firewall / gateway, etc. I 
feel it might be another unnecessary assumption/dependency we introduce that is 
not buying us too much.

Another thing I am not sure is how often an application process crashes except 
people do a kill -9. In most cases there are multiple threads in an 
application. If an uncaught exception is thrown, usually only that thread dies 
and the process will hang but not exit unless the people do that explicitly 
like mirror maker does. In that case, is it reasonable to expect the 
client.close() to be called in the application shutdown hook or some finally 
block? (It may not be the case for some other language like C, though). If 
using TCP close mainly addresses kill -9. It is very likely that session 
timeout has already reached when people manually kill the process.

> leave group request
> -------------------
>
>                 Key: KAFKA-2397
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2397
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>            Reporter: Onur Karaman
>            Assignee: Onur Karaman
>            Priority: Minor
>             Fix For: 0.8.3
>
>
> Let's say every consumer in a group has session timeout s. Currently, if a 
> consumer leaves the group, the worst case time to stabilize the group is 2s 
> (s to detect the consumer failure + s for the rebalance window). If a 
> consumer instead can declare they are leaving the group, the worst case time 
> to stabilize the group would just be the s associated with the rebalance 
> window.
> This is a low priority optimization!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2397) leave group request

Reply via email to