[ https://issues.apache.org/jira/browse/KAFKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956001#comment-14956001 ]
Guozhang Wang commented on KAFKA-2017: -------------------------------------- [~hachikuji] [~onurkaraman] [~junrao] With the new protocol, coordinator does not need to remember any member metadata except the member ids since now we only validate on member-id and generation-id. So after KAFKA-2464 is merged in I propose to store the group metadata as: {code} /coordinator/consumers/[groupId]: version: short generationId: int members: String // <- member-ids split by "," and do now allow "," in member-id, the first member is always the leader. {code} The reading logic is: 1. Upon handling HB / OffsetCommit / OffsetFetch request, after validating the group belongs to itself and coordinator.isActive, if the group does not exist in the group metadata cache, try reading from ZK; leave other non-persistent fields in the GroupMetadata and MemberMetadata as null. 2. Upon handling JoinGroup, after validating the group belongs to itself and coordinator.isActive, if the group does not exist in the group metadata cache, try reading from ZK; if the consumer already exists, follow the normal path of handlJoinGroup, the only difference is that we will update the member metadata and always trigger a rebalance. 3. Upon handling SyncGroup, after validating the group belongs to itself and coordinator.isActive, if the group does not exist in the group metadata cache, try reading from ZK; then follow the normal path of handleSyncGroup. The write logic is as follows: After the Join-group barrier, update the ZK with the generation id / leader-id / members. With this proposal, we do not need a "Initialize" state as in the original proposal https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Client-side+Assignment+Proposal. > Persist Coordinator State for Coordinator Failover > -------------------------------------------------- > > Key: KAFKA-2017 > URL: https://issues.apache.org/jira/browse/KAFKA-2017 > Project: Kafka > Issue Type: Sub-task > Components: consumer > Affects Versions: 0.9.0.0 > Reporter: Onur Karaman > Assignee: Guozhang Wang > Fix For: 0.9.0.0 > > Attachments: KAFKA-2017.patch, KAFKA-2017_2015-05-20_09:13:39.patch, > KAFKA-2017_2015-05-21_19:02:47.patch > > > When a coordinator fails, the group membership protocol tries to failover to > a new coordinator without forcing all the consumers rejoin their groups. This > is possible if the coordinator persists its state so that the state can be > transferred during coordinator failover. This state consists of most of the > information in GroupRegistry and ConsumerRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)