[ https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731373#comment-14731373 ]
Jiangjie Qin commented on KAFKA-2397: ------------------------------------- [~jkreps] using TCP close to signal disconnect does have merits. It works either when client process crashes or closes normally. It is just not very clear to me whether it is worth doing here. The price we pay here is we have to propagate every connection close at network to coordinator. From the server log in LinkedIn I saw, socket closure is quite frequent. Todd even submitted a patch to change that particular log to debug level. They could just be the ad-hoc SyncProducer in old consumer to refresh metadata. Maybe I'm over concerned but I am a bit worried about the noise here. I don't know in which case a TCP connection might be closed. Proxy was mentioned earlier, maybe some workload balancer / firewall / gateway, etc. I feel it might be another unnecessary assumption/dependency we introduce that is not buying us too much. Another thing I am not sure is how often an application process crashes except people do a kill -9. In most cases there are multiple threads in an application. If an uncaught exception is thrown, usually only that thread dies and the process will hang but not exit unless the people do that explicitly like mirror maker does. In that case, is it reasonable to expect the client.close() to be called in the application shutdown hook or some finally block? (It may not be the case for some other language like C, though). If using TCP close mainly addresses kill -9. It is very likely that session timeout has already reached when people manually kill the process. > leave group request > ------------------- > > Key: KAFKA-2397 > URL: https://issues.apache.org/jira/browse/KAFKA-2397 > Project: Kafka > Issue Type: Sub-task > Components: consumer > Reporter: Onur Karaman > Assignee: Onur Karaman > Priority: Minor > Fix For: 0.8.3 > > > Let's say every consumer in a group has session timeout s. Currently, if a > consumer leaves the group, the worst case time to stabilize the group is 2s > (s to detect the consumer failure + s for the rebalance window). If a > consumer instead can declare they are leaving the group, the worst case time > to stabilize the group would just be the s associated with the rebalance > window. > This is a low priority optimization! -- This message was sent by Atlassian JIRA (v6.3.4#6332)