We have a really simple Kafka set up in our development lab. It's just one
node. Periodically, we run into this error:

[2015-08-10 13:45:52,405] ERROR Controller 0 epoch 488 initiated state
change for partition [test-data,1] from OfflinePartition to
OnlinePartition failed (state.change.logger)
kafka.common.NoReplicaOnlineException: No replica for partition
[test-data,1] is alive. Live brokers are: [Set()], Assigned replicas
are: [List(0)]
        at 
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:61)
        at 
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336)
        at 
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:185)
        at 
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:99)
        at 
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:96)
        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:743)

Can anyone recommend a strategy for recovering from this? Is there such a
thing or do we need to build out another node or two and set up the
replication factor on our topics to cover all of the nodes that we put into
the cluster?

We have 3 zookeeper nodes that respond very well for other applications
like Storm and HBase, so we're pretty confident that ZooKeeper isn't to
blame here. Any ideas?
Thanks,

Mike

Reply via email to