[ 
https://issues.apache.org/jira/browse/KAFKA-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272603#comment-15272603
 ] 

Jun Rao commented on KAFKA-3173:
--------------------------------

[~fpj], yes, I agree that the lock there is confusing. Most of the time, the 
state machines are only changed in the ZkClient event thread. It's just that 
when the controller gets started for the first time, the initialization of the 
state machines will be done from a different thread. The controller lock is 
used to synchronize between this thread and the ZkClient event thread. We can 
probably improve the locking logic when we clean up the controller logic.

> Error while moving some partitions to OnlinePartition state 
> ------------------------------------------------------------
>
>                 Key: KAFKA-3173
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3173
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.9.0.0
>            Reporter: Flavio Junqueira
>            Assignee: Flavio Junqueira
>            Priority: Critical
>             Fix For: 0.10.0.1
>
>         Attachments: KAFKA-3173-race-repro.patch
>
>
> We observed another instance of the problem reported in KAFKA-2300, but this 
> time the error appeared in the partition state machine. In KAFKA-2300, we 
> haven't cleaned up the state in {{PartitionStateMachine}} and 
> {{ReplicaStateMachine}} as we do in {{KafkaController}}.
> Here is the stack trace:
> {noformat}
> 2016-01-29 15:26:51,393] ERROR [Partition state machine on Controller 0]: 
> Error while moving some partitions to OnlinePartition state 
> (kafka.controller.PartitionStateMachine)java.lang.IllegalStateException: 
> Controller to broker state change requests batch is not empty while creating 
> a new one. 
> Some LeaderAndIsr state changes Map(0 -> Map(foo-0 -> (LeaderAndIsrInfo:
> (Leader:0,ISR:0,LeaderEpoch:0,ControllerEpoch:1),ReplicationFactor:1),AllReplicas:0)))
>  might be lost        at 
> kafka.controller.ControllerBrokerRequestBatch.newBatch(ControllerChannelManager.scala:254)
>         at 
> kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:144)
>         at 
> kafka.controller.KafkaController.onNewPartitionCreation(KafkaController.scala:517)
>         at 
> kafka.controller.KafkaController.onNewTopicCreation(KafkaController.scala:504)
>         at 
> kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(PartitionStateMachine.scala:437)
>         at 
> kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:419)
>         at 
> kafka.controller.PartitionStateMachine$TopicChangeListener$$anonfun$handleChildChange$1.apply(PartitionStateMachine.scala:419)
>         at 
> kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)        at 
> kafka.controller.PartitionStateMachine$TopicChangeListener.handleChildChange(PartitionStateMachine.scala:418)
>         at 
> org.I0Itec.zkclient.ZkClient$10.run(ZkClient.java:842)        at 
> org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to