[ https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652686#comment-14652686 ]
Jay Kreps commented on KAFKA-2397: ---------------------------------- Nice summary [~onurkaraman]. I agree that adding a field to heartbeat is functionally equivalent to a leave_group request/resp. The reason for preferring that was just to reduce the conceptual weight of the protocol. A second idea that I'm not sure is good: rather than having either a new request or a heartbeat it would be possible to use the TCP connection closure for this. The advantage would be ANY process death that didn't also kill the OS would then be detectable without any client participation needed. The downside is that (1) the server change would be slightly more involved, and (2) you wouldn't be able to close the connection for other reasons. The complexity of implementation is that currently only the network layer knows about socket closes. However we were already introducing a session concept for the security work which allows the KakaApi layer to have access to cross-request state such as the authenticated identity. We could make it possible to add shutdown actions to the session that would make it possible to trigger this; or alternately we could add a way to add onSocketClose actions directly to the network layer. This same feature would actually be useful for the purgatory. Currently when a connection is closed, I don't think that requests in purgatory are removed. If the purgatory timeout is very small this is okay, but a very common thing for people to ask for NO timeout in which case each connection close potentially leaks memory. I think we kind of "fixed" this by just overriding the max wait time but purging purgatory on shutdown is obviously preferable. > leave group request > ------------------- > > Key: KAFKA-2397 > URL: https://issues.apache.org/jira/browse/KAFKA-2397 > Project: Kafka > Issue Type: Sub-task > Components: consumer > Reporter: Onur Karaman > Assignee: Onur Karaman > Priority: Minor > Fix For: 0.8.3 > > > Let's say every consumer in a group has session timeout s. Currently, if a > consumer leaves the group, the worst case time to stabilize the group is 2s > (s to detect the consumer failure + s for the rebalance window). If a > consumer instead can declare they are leaving the group, the worst case time > to stabilize the group would just be the s associated with the rebalance > window. > This is a low priority optimization! -- This message was sent by Atlassian JIRA (v6.3.4#6332)