[ https://issues.apache.org/jira/browse/KAFKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959438#comment-14959438 ]
Jiangjie Qin commented on KAFKA-2017: ------------------------------------- [~guozhang] 1) agreed. Pause and Resume would help much here. 2) Agreed with the calculation and looks it would work. But the calculation depend on A) consumer id is fixed and short, B) the writes are pretty even to ZK. (A) is actually a shortcoming of new consumer. Today if something goes wrong, people can easily see which service it is from the consumer name on the server log. Now it become UUID-like bytes. I worry it might break some alerting system. Although this is a problem itself, I don't think it matters too much here (The consumer group data will be small anyway). (B) is kind of important, take ISR change propagation for example, theoretically we only have 6 writes/second, and each writes is less than 10K. But it somehow slows down controlled shutdown dramatically. The investigation is also kind of painful because different systems are involved. 3) Like you said having a separate topic would help. Also if we care about the failover time, perhaps we can always let followers also keep the consumer group data in memory and update the information on fetching from the leader. The data size should be very small. > Persist Coordinator State for Coordinator Failover > -------------------------------------------------- > > Key: KAFKA-2017 > URL: https://issues.apache.org/jira/browse/KAFKA-2017 > Project: Kafka > Issue Type: Sub-task > Components: consumer > Affects Versions: 0.9.0.0 > Reporter: Onur Karaman > Assignee: Guozhang Wang > Fix For: 0.9.0.0 > > Attachments: KAFKA-2017.patch, KAFKA-2017_2015-05-20_09:13:39.patch, > KAFKA-2017_2015-05-21_19:02:47.patch > > > When a coordinator fails, the group membership protocol tries to failover to > a new coordinator without forcing all the consumers rejoin their groups. This > is possible if the coordinator persists its state so that the state can be > transferred during coordinator failover. This state consists of most of the > information in GroupRegistry and ConsumerRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)