[ https://issues.apache.org/jira/browse/KAFKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960809#comment-14960809 ]
Todd Palino commented on KAFKA-2017: ------------------------------------ Just to throw in my 2 cents here, I don't think that persisting this state in a special topic in Kafka is a bad idea. My only concern is that we have seen issues with the offsets already from time to time, and we'll want to make sure we take those lessons learned and handle them from the start. The ones I am aware of are: 1) Creation of the special topic at cluster initialization. If we specify an RF of N for the special topic, then the brokers must make this happen. The first broker that comes up can't create it with an RF of 1 and own all the partitions. Either it must reject all operations that would use the special topic until N brokers are members of the cluster and the it can be created, or it must create the topic in such a way that as soon as there are N brokers available the RF is corrected to the configured number. 2) Load of the special topic into local cache. Whenever a coordinator loads the special topic, there is a period of time while it is loading state where it cannot service requests. We've seen problems with this related to log compaction, where the partitions were excessively large, but I can see as we move an increasing number of (group, partition) tuples over to Kafka-committed offsets it could become a scale issue very easily. This should not be a big deal for group state information, as that should always be smaller than the offset information for the group, but we may want to create a longer term plan for handling auto-scaling of the special topics (the ability to increase the number of partitions and move group information from the partition it used to hash to to the one it hashes to after scaling). > Persist Coordinator State for Coordinator Failover > -------------------------------------------------- > > Key: KAFKA-2017 > URL: https://issues.apache.org/jira/browse/KAFKA-2017 > Project: Kafka > Issue Type: Sub-task > Components: consumer > Affects Versions: 0.9.0.0 > Reporter: Onur Karaman > Assignee: Guozhang Wang > Fix For: 0.9.0.0 > > Attachments: KAFKA-2017.patch, KAFKA-2017_2015-05-20_09:13:39.patch, > KAFKA-2017_2015-05-21_19:02:47.patch > > > When a coordinator fails, the group membership protocol tries to failover to > a new coordinator without forcing all the consumers rejoin their groups. This > is possible if the coordinator persists its state so that the state can be > transferred during coordinator failover. This state consists of most of the > information in GroupRegistry and ConsumerRegistry. -- This message was sent by Atlassian JIRA (v6.3.4#6332)