[
https://issues.apache.org/jira/browse/KAFKA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14652686#comment-14652686
]
Jay Kreps commented on KAFKA-2397:
----------------------------------
Nice summary [~onurkaraman].
I agree that adding a field to heartbeat is functionally equivalent to a
leave_group request/resp. The reason for preferring that was just to reduce the
conceptual weight of the protocol.
A second idea that I'm not sure is good: rather than having either a new
request or a heartbeat it would be possible to use the TCP connection closure
for this. The advantage would be ANY process death that didn't also kill the OS
would then be detectable without any client participation needed. The downside
is that (1) the server change would be slightly more involved, and (2) you
wouldn't be able to close the connection for other reasons.
The complexity of implementation is that currently only the network layer knows
about socket closes. However we were already introducing a session concept for
the security work which allows the KakaApi layer to have access to
cross-request state such as the authenticated identity. We could make it
possible to add shutdown actions to the session that would make it possible to
trigger this; or alternately we could add a way to add onSocketClose actions
directly to the network layer.
This same feature would actually be useful for the purgatory. Currently when a
connection is closed, I don't think that requests in purgatory are removed. If
the purgatory timeout is very small this is okay, but a very common thing for
people to ask for NO timeout in which case each connection close potentially
leaks memory. I think we kind of "fixed" this by just overriding the max wait
time but purging purgatory on shutdown is obviously preferable.
> leave group request
> -------------------
>
> Key: KAFKA-2397
> URL: https://issues.apache.org/jira/browse/KAFKA-2397
> Project: Kafka
> Issue Type: Sub-task
> Components: consumer
> Reporter: Onur Karaman
> Assignee: Onur Karaman
> Priority: Minor
> Fix For: 0.8.3
>
>
> Let's say every consumer in a group has session timeout s. Currently, if a
> consumer leaves the group, the worst case time to stabilize the group is 2s
> (s to detect the consumer failure + s for the rebalance window). If a
> consumer instead can declare they are leaving the group, the worst case time
> to stabilize the group would just be the s associated with the rebalance
> window.
> This is a low priority optimization!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)