[jira] [Commented] (KAFKA-2017) Persist Coordinator State for Coordinator Failover

Todd Palino (JIRA) Fri, 16 Oct 2015 07:41:44 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960809#comment-14960809
 ]


Todd Palino commented on KAFKA-2017:
------------------------------------

Just to throw in my 2 cents here, I don't think that persisting this state in a 
special topic in Kafka is a bad idea. My only concern is that we have seen 
issues with the offsets already from time to time, and we'll want to make sure 
we take those lessons learned and handle them from the start. The ones I am 
aware of are:

1) Creation of the special topic at cluster initialization. If we specify an RF 
of N for the special topic, then the brokers must make this happen. The first 
broker that comes up can't create it with an RF of 1 and own all the 
partitions. Either it must reject all operations that would use the special 
topic until N brokers are members of the cluster and the it can be created, or 
it must create the topic in such a way that as soon as there are N brokers 
available the RF is corrected to the configured number.

2) Load of the special topic into local cache. Whenever a coordinator loads the 
special topic, there is a period of time while it is loading state where it 
cannot service requests. We've seen problems with this related to log 
compaction, where the partitions were excessively large, but I can see as we 
move an increasing number of (group, partition) tuples over to Kafka-committed 
offsets it could become a scale issue very easily. This should not be a big 
deal for group state information, as that should always be smaller than the 
offset information for the group, but we may want to create a longer term plan 
for handling auto-scaling of the special topics (the ability to increase the 
number of partitions and move group information from the partition it used to 
hash to to the one it hashes to after scaling).

> Persist Coordinator State for Coordinator Failover
> --------------------------------------------------
>
>                 Key: KAFKA-2017
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2017
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>    Affects Versions: 0.9.0.0
>            Reporter: Onur Karaman
>            Assignee: Guozhang Wang
>             Fix For: 0.9.0.0
>
>         Attachments: KAFKA-2017.patch, KAFKA-2017_2015-05-20_09:13:39.patch, 
> KAFKA-2017_2015-05-21_19:02:47.patch
>
>
> When a coordinator fails, the group membership protocol tries to failover to 
> a new coordinator without forcing all the consumers rejoin their groups. This 
> is possible if the coordinator persists its state so that the state can be 
> transferred during coordinator failover. This state consists of most of the 
> information in GroupRegistry and ConsumerRegistry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2017) Persist Coordinator State for Coordinator Failover

Reply via email to