----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34450/#review84539 -----------------------------------------------------------
I only did a brief skim. This optimization tries to switch consumers over to a new coordinator without a rebalance. From my understanding, the consumers would detect a coordinator failure, discover the new coordinator to work with, and try heartbeating that new coordinator withouth a rebalance. So it seems to me that putting the logic in handleJoinGroup isn't right, as the rebalance is what we're trying to avoid. The code should be in handleHeartbeat. It should lookup zk for the group info, add it to CoordinatorMetadata, and start up a DelayedHeartbeat for every consumer of that group. **More importantly: given that this is just an optimization, and we haven't even seen the performance hit without this, I think KAFKA-2017 should be very low priority.** The following are higher priority: 1. Getting the consumer to properly handle error codes of the join group and heartbeat responses. 2. Getting the consumer to detect coordinator failures and switch over to another coordinator (my KAFKA-1334 patch just had the coordinator detect consumer failures). A nice benefit of completing this first is that if we decide that the rebalances on coordinator failover are an actual issue, this would greatly facilitate testing any coordinator failover logic. Right now, it's unclear how this rb's logic can be tested. - Onur Karaman On May 20, 2015, 4:13 p.m., Guozhang Wang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/34450/ > ----------------------------------------------------------- > > (Updated May 20, 2015, 4:13 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-2017 > https://issues.apache.org/jira/browse/KAFKA-2017 > > > Repository: kafka > > > Description > ------- > > 1. Upon receiving join-group, if the group metadata cannot be found in the > local cache try to read it from ZK; 2. Upon completing rebalance, update the > ZK with new group registry or delete the registry if the group becomes empty > > > Diffs > ----- > > core/src/main/scala/kafka/coordinator/ConsumerCoordinator.scala > af06ad45cdc46ac3bc27898ebc1a5bd5b1c7b19e > core/src/main/scala/kafka/coordinator/ConsumerGroupMetadata.scala > 47bdfa7cc86fd4e841e2b1d6bfd40f1508e643bd > core/src/main/scala/kafka/coordinator/CoordinatorMetadata.scala > c39e6de34ee531c6dfa9107b830752bd7f8fbe59 > core/src/main/scala/kafka/utils/ZkUtils.scala > 2618dd39b925b979ad6e4c0abd5c6eaafb3db5d5 > > Diff: https://reviews.apache.org/r/34450/diff/ > > > Testing > ------- > > > Thanks, > > Guozhang Wang > >