[ https://issues.apache.org/jira/browse/KAFKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959300#comment-14959300 ]
Guozhang Wang commented on KAFKA-2017: -------------------------------------- rep. [~jjkoshy] and [~becket_qin]: 1) I think with {code} pause {code} and {code} resume {code} frequent dynamic subscription would not be so common, but it may be hard for us to convince each other either way since workload conjecture are indeed hard to validate anyways. 2) We can think about ZK write performance: with generation-ids + consumer-list, with a consumer group of 100 members, each write would be no more than 1K (I should mention this is since coordinator generated member-id is short and does not bring host information). With this data size a 5 node ZK should be able to handle at least 10K / sec writes with latency around couple of ms on HDD, and on SSD latency should be much less ([~fpj] knows this much better that I do). So unless we have scenarios that each consumer forms a single group then it should be fine: in this case it would be the same as offset commits in ZK anyways. 3) I think I need to clarify a bit more about my concerns of loading latency: with persistent state, once a broker handles becomeLeader for some partition it is in the phase of "state-loading-in-progress" for that partition of groups; now with coordinator migration most consumer groups will try to first send a HB request to the new coordinator, who cannot response until the end of the "state-loading-in-progress" phase, but just let consumers to retry. If during this period there are new JoinGroup / OffsetCommit requests coming, they have to wait until the end of this phase as well. Since "state-loading-in-progress" will not only affects offset-fetch but all kinds of requests, we want it to be as short as possible, and probably not coupled with "offset-loading-in-progress". Admittedly this can be partially optimized in Kafka with a separate topic loading by a another parallel background thread, depending on the log compaction configuration. > Persist Coordinator State for Coordinator Failover > -------------------------------------------------- > > Key: KAFKA-2017 > URL: https://issues.apache.org/jira/browse/KAFKA-2017 > Project: Kafka > Issue Type: Sub-task > Components: consumer > Affects Versions: 0.9.0.0 > Reporter: Onur Karaman > Assignee: Guozhang Wang > Fix For: 0.9.0.0 > > Attachments: KAFKA-2017.patch, KAFKA-2017_2015-05-20_09:13:39.patch, > KAFKA-2017_2015-05-21_19:02:47.patch > > > When a coordinator fails, the group membership protocol tries to failover to > a new coordinator without forcing all the consumers rejoin their groups. This > is possible if the coordinator persists its state so that the state can be > transferred during coordinator failover. This state consists of most of the > information in GroupRegistry and ConsumerRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)