[ https://issues.apache.org/jira/browse/KAFKA-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15694773#comment-15694773 ]
ASF GitHub Bot commented on KAFKA-4442: --------------------------------------- GitHub user lindong28 opened a pull request: https://github.com/apache/kafka/pull/2167 KAFKA-4442; Controller should grab lock when it is being initialized to avoid race condition You can merge this pull request into a Git repository by running: $ git pull https://github.com/lindong28/kafka KAFKA-4442 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2167.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2167 ---- commit 16825e60963844ab0729bf290cfc9e6cee79932f Author: Dong Lin <lindon...@gmail.com> Date: 2016-11-25T04:07:09Z KAFKA-4442; Controller should grab lock when it is being initialized to avoid race condition ---- > Controller should grab lock when it is being initialized to avoid race > condition > -------------------------------------------------------------------------------- > > Key: KAFKA-4442 > URL: https://issues.apache.org/jira/browse/KAFKA-4442 > Project: Kafka > Issue Type: Bug > Reporter: Dong Lin > Assignee: Dong Lin > > Currently controller will register broker change listener before sending send > LeaderAndIsrRequests to live replicas. The call path looks like this: > - onControllerFailover() > - partitionStateMachine.startup() > - triggerOnlinePartitionStateChange() > - handleStateChange(partition, OnlinePartition) > - electLeaderForPartition(partition) > - determines live replicas for this partition (step a) > - add partition to controllerContext.partitionLeadershipInfo. (step > b) > - send LeaderAndIsrRequest to those live replics for this partition > However, if a broker registers itself in zookeeper in between step (a) and > step (b), the onBrokerStartup() will not send LeaderAndIsrRequest to this > broker for this partition because the partition is not found in > controllerContext.partitionLeadershipInfo. Yet onControllerFailover() will > not send LeaderAndIsrRequest to this broker for this partition either before > the broker is not considered live in step (a). > The root cause is that onBrokerStartup() should only be executed after > controller has finished onControllerFailover() and initialized its state. > Therefore controller should grab the lock controllerContext.controllerLock > during onControllerFailover(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)