[jira] [Commented] (KAFKA-2017) Persist Coordinator State for Coordinator Failover

Guozhang Wang (JIRA) Thu, 15 Oct 2015 11:07:37 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959300#comment-14959300
 ]


Guozhang Wang commented on KAFKA-2017:
--------------------------------------

rep. [~jjkoshy] and [~becket_qin]: 

1) I think with {code} pause {code} and {code} resume {code} frequent dynamic 
subscription would not be so common, but it may be hard for us to convince each 
other either way since workload conjecture are indeed hard to validate anyways.

2) We can think about ZK write performance: with generation-ids + 
consumer-list, with a consumer group of 100 members, each write would be no 
more than 1K (I should mention this is since coordinator generated member-id is 
short and does not bring host information). With this data size a 5 node ZK 
should be able to handle at least 10K / sec writes with latency around couple 
of ms on HDD, and on SSD latency should be much less ([~fpj] knows this much 
better that I do). So unless we have scenarios that each consumer forms a 
single group then it should be fine: in this case it would be the same as 
offset commits in ZK anyways.

3) I think I need to clarify a bit more about my concerns of loading latency: 
with persistent state, once a broker handles becomeLeader for some partition it 
is in the phase of "state-loading-in-progress" for that partition of groups; 
now with coordinator migration most consumer groups will try to first send a HB 
request to the new coordinator, who cannot response until the end of the 
"state-loading-in-progress" phase, but just let consumers to retry. If during 
this period there are new JoinGroup / OffsetCommit requests coming, they have 
to wait until the end of this phase as well. Since "state-loading-in-progress" 
will not only affects offset-fetch but all kinds of requests, we want it to be 
as short as possible, and probably not coupled with 
"offset-loading-in-progress". Admittedly this can be partially optimized in 
Kafka with a separate topic loading by a another parallel background thread, 
depending on the log compaction configuration.

> Persist Coordinator State for Coordinator Failover
> --------------------------------------------------
>
>                 Key: KAFKA-2017
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2017
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>    Affects Versions: 0.9.0.0
>            Reporter: Onur Karaman
>            Assignee: Guozhang Wang
>             Fix For: 0.9.0.0
>
>         Attachments: KAFKA-2017.patch, KAFKA-2017_2015-05-20_09:13:39.patch, 
> KAFKA-2017_2015-05-21_19:02:47.patch
>
>
> When a coordinator fails, the group membership protocol tries to failover to 
> a new coordinator without forcing all the consumers rejoin their groups. This 
> is possible if the coordinator persists its state so that the state can be 
> transferred during coordinator failover. This state consists of most of the 
> information in GroupRegistry and ConsumerRegistry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2017) Persist Coordinator State for Coordinator Failover

Reply via email to