[ https://issues.apache.org/jira/browse/KAFKA-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272603#comment-15272603 ]
Jun Rao commented on KAFKA-3173: -------------------------------- [~fpj], yes, I agree that the lock there is confusing. Most of the time, the state machines are only changed in the ZkClient event thread. It's just that when the controller gets started for the first time, the initialization of the state machines will be done from a different thread. The controller lock is used to synchronize between this thread and the ZkClient event thread. We can probably improve the locking logic when we clean up the controller logic. > Error while moving some partitions to OnlinePartition state > ------------------------------------------------------------ > > Key: KAFKA-3173 > URL: https://issues.apache.org/jira/browse/KAFKA-3173 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.9.0.0 > Reporter: Flavio Junqueira > Assignee: Flavio Junqueira > Priority: Critical > Fix For: 0.10.0.1 > > Attachments: KAFKA-3173-race-repro.patch > > > We observed another instance of the problem reported in KAFKA-2300, but this > time the error appeared in the partition state machine. In KAFKA-2300, we > haven't cleaned up the state in {{PartitionStateMachine}} and > {{ReplicaStateMachine}} as we do in {{KafkaController}}. > Here is the stack trace: > {noformat} > 2016-01-29 15:26:51,393] ERROR [Partition state machine on Controller 0]: > Error while moving some partitions to OnlinePartition state > (kafka.controller.PartitionStateMachine)java.lang.IllegalStateException: > Controller to broker state change requests batch is not empty while creating > a new one. > Some LeaderAndIsr state changes Map(0 -> Map(foo-0 -> (LeaderAndIsrInfo: > (Leader:0,ISR:0,LeaderEpoch:0,ControllerEpoch:1),ReplicationFactor:1),AllReplicas:0))) > might be lost at > kafka.controller.ControllerBrokerRequestBatch.newBatch(ControllerChannelManager.scala:254) > at > kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:144) > at > kafka.controller.KafkaController.onNewPartitionCreation(KafkaController.scala:517) > at > kafka.controller.KafkaController.onNewTopicCreation(KafkaController.scala:504) > at > kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(PartitionStateMachine.scala:437) > at > kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:419) > at > kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:419) > at > kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at > kafka.controller.PartitionStateMachine$TopicChangeListener.handleChildChange(PartitionStateMachine.scala:418) > at > org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842) at > org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)