-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34450/#review84539
-----------------------------------------------------------


I only did a brief skim. This optimization tries to switch consumers over to a 
new coordinator without a rebalance. From my understanding, the consumers would 
detect a coordinator failure, discover the new coordinator to work with, and 
try heartbeating that new coordinator withouth a rebalance.

So it seems to me that putting the logic in handleJoinGroup isn't right, as the 
rebalance is what we're trying to avoid. The code should be in handleHeartbeat. 
It should lookup zk for the group info, add it to CoordinatorMetadata, and 
start up a DelayedHeartbeat for every consumer of that group.

**More importantly: given that this is just an optimization, and we haven't 
even seen the performance hit without this, I think KAFKA-2017 should be very 
low priority.**

The following are higher priority:
1. Getting the consumer to properly handle error codes of the join group and 
heartbeat responses.
2. Getting the consumer to detect coordinator failures and switch over to 
another coordinator (my KAFKA-1334 patch just had the coordinator detect 
consumer failures). A nice benefit of completing this first is that if we 
decide that the rebalances on coordinator failover are an actual issue, this 
would greatly facilitate testing any coordinator failover logic. Right now, 
it's unclear how this rb's logic can be tested.

- Onur Karaman


On May 20, 2015, 4:13 p.m., Guozhang Wang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34450/
> -----------------------------------------------------------
> 
> (Updated May 20, 2015, 4:13 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2017
>     https://issues.apache.org/jira/browse/KAFKA-2017
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> 1. Upon receiving join-group, if the group metadata cannot be found in the 
> local cache try to read it from ZK; 2. Upon completing rebalance, update the 
> ZK with new group registry or delete the registry if the group becomes empty
> 
> 
> Diffs
> -----
> 
>   core/src/main/scala/kafka/coordinator/ConsumerCoordinator.scala 
> af06ad45cdc46ac3bc27898ebc1a5bd5b1c7b19e 
>   core/src/main/scala/kafka/coordinator/ConsumerGroupMetadata.scala 
> 47bdfa7cc86fd4e841e2b1d6bfd40f1508e643bd 
>   core/src/main/scala/kafka/coordinator/CoordinatorMetadata.scala 
> c39e6de34ee531c6dfa9107b830752bd7f8fbe59 
>   core/src/main/scala/kafka/utils/ZkUtils.scala 
> 2618dd39b925b979ad6e4c0abd5c6eaafb3db5d5 
> 
> Diff: https://reviews.apache.org/r/34450/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>

Reply via email to