[ 
https://issues.apache.org/jira/browse/KAFKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959438#comment-14959438
 ] 

Jiangjie Qin commented on KAFKA-2017:
-------------------------------------

[~guozhang]
1) agreed. Pause and Resume would help much here.

2) Agreed with the calculation and looks it would work. But the calculation 
depend on A) consumer id is fixed and short, B) the writes are pretty even to 
ZK. 
(A) is actually a shortcoming of new consumer. Today if something goes wrong, 
people can easily see which service it is from the consumer name on the server 
log. Now it become UUID-like bytes. I worry it might break some alerting 
system. Although this is a problem itself, I don't think it matters too much 
here (The consumer group data will be small anyway). 
(B) is kind of important, take ISR change propagation for example, 
theoretically we only have 6 writes/second, and each writes is less than 10K. 
But it somehow slows down controlled shutdown dramatically. The investigation 
is also kind of painful because different systems are involved.

3) Like you said having a separate topic would help. Also if we care about the 
failover time, perhaps we can always let followers also keep the consumer group 
data in memory and update the information on fetching from the leader. The data 
size should be very small.

> Persist Coordinator State for Coordinator Failover
> --------------------------------------------------
>
>                 Key: KAFKA-2017
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2017
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>    Affects Versions: 0.9.0.0
>            Reporter: Onur Karaman
>            Assignee: Guozhang Wang
>             Fix For: 0.9.0.0
>
>         Attachments: KAFKA-2017.patch, KAFKA-2017_2015-05-20_09:13:39.patch, 
> KAFKA-2017_2015-05-21_19:02:47.patch
>
>
> When a coordinator fails, the group membership protocol tries to failover to 
> a new coordinator without forcing all the consumers rejoin their groups. This 
> is possible if the coordinator persists its state so that the state can be 
> transferred during coordinator failover. This state consists of most of the 
> information in GroupRegistry and ConsumerRegistry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to