There is some error log about failing leader election like that.
[2014-06-18 08:59:21,014] ERROR Controller 7 epoch 4 encountered error while electing leader for partition [topicDEBUG,5] due to: Preferred replica 1 for partition [topicDEBUG,5] is either not alive or not in the isr. Current leader and ISR: [{"leader":8,"leader_epoch":6,"isr":[8,2]}]. (state.change.logger) [2014-06-18 08:59:21,014] ERROR Controller 7 epoch 4 initiated state change for partition [topicDEBUG,5] from OnlinePartition to OnlinePartition failed (state.change.logger) kafka.common.StateChangeFailedException: encountered error while electing leader for partition [topicDEBUG,5] due to: Preferred replica 1 for partition [topicDEBUG,5] is either not alive or not in the isr. Current leader and ISR: [{"leader":8,"leader_epoch":6,"isr":[8,2]}]. at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:360) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:187) at kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:125) at kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:124) at scala.collection.immutable.Set$Set1.foreach(Set.scala:86) at kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:124) at kafka.controller.KafkaController.onPreferredReplicaElection(KafkaController.scala:618) at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17$$anonfun$apply$5.apply$mcV$sp(KafkaController.scala:1118) at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17$$anonfun$apply$5.apply(KafkaController.scala:1112) at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17$$anonfun$apply$5.apply(KafkaController.scala:1112) at kafka.utils.Utils$.inLock(Utils.scala:538) at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17.apply(KafkaController.scala:1109) at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$17.apply(KafkaController.scala:1107) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:95) at scala.collection.Iterator$class.foreach(Iterator.scala:772) at scala.collection.mutable.HashTable$$anon$1.foreach(HashTable.scala:157) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:190) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:45) at scala.collection.mutable.HashMap.foreach(HashMap.scala:95) at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4.apply(KafkaController.scala:1107) at kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4.apply(KafkaController.scala:1086) at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:178) at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:347) at kafka.controller.KafkaController.kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance(KafkaController.scala:1086) at kafka.controller.KafkaController$$anonfun$onControllerFailover$1.apply$mcV$sp(KafkaController.scala:324) at kafka.utils.KafkaScheduler$$anon$1.run(KafkaScheduler.scala:100) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: kafka.common.StateChangeFailedException: Preferred replica 1 for partition [topicDEBUG,5] is either not alive or not in the isr. Current leader and ISR: [{"leader":8,"leader_epoch":6,"isr":[8,2]}] at kafka.controller.PreferredReplicaPartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:144) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:336) ... 33 more Thanks. On Tue, Jun 17, 2014 at 12:42 PM, Jun Rao <jun...@gmail.com> wrote: > Any error in the controller and state-change log? > > Thanks, > > Jun > > > On Mon, Jun 16, 2014 at 6:05 PM, Bongyeon Kim <bongyeon....@gmail.com> > wrote: > > > Hi, team. > > > > Im using Kafka 0.8.1.1. > > I'm running 8 brokers on 4 machine. (2 brokers on 1 machine) and I have 3 > > topics each have 16 partitions and 3 replicas. > > > > kafka-topics describe is > > > > Topic:topicCDR PartitionCount:16 ReplicationFactor:3 Configs: > retention.ms > > =3600000 > > Topic: topicCDR Partition: 0 Leader: 3 Replicas: 3,1,2 Isr: 3,2 > > Topic: topicCDR Partition: 1 Leader: 4 Replicas: 4,2,3 Isr: 3,4,2 > > Topic: topicCDR Partition: 2 Leader: 5 Replicas: 5,3,4 Isr: 3,4,5 > > Topic: topicCDR Partition: 3 Leader: 6 Replicas: 6,4,5 Isr: 4,5,6 > > Topic: topicCDR Partition: 4 Leader: 7 Replicas: 7,5,6 Isr: 5,6,7 > > Topic: topicCDR Partition: 5 Leader: 8 Replicas: 8,6,7 Isr: 6,7,8 > > Topic: topicCDR Partition: 6 Leader: 1 Replicas: 1,7,8 Isr: 1,7,8 > > Topic: topicCDR Partition: 7 Leader: 2 Replicas: 2,8,1 Isr: 8,2 > > Topic: topicCDR Partition: 8 Leader: 3 Replicas: 3,2,4 Isr: 3,4,2 > > Topic: topicCDR Partition: 9 Leader: 4 Replicas: 4,3,5 Isr: 3,4,5 > > Topic: topicCDR Partition: 10 Leader: 5 Replicas: 5,4,6 Isr: 4,5,6 > > Topic: topicCDR Partition: 11 Leader: 6 Replicas: 6,5,7 Isr: 5,6,7 > > Topic: topicCDR Partition: 12 Leader: 7 Replicas: 7,6,8 Isr: 6,7,8 > > Topic: topicCDR Partition: 13 Leader: 8 Replicas: 8,7,1 Isr: 7,8 > > Topic: topicCDR Partition: 14 Leader: 8 Replicas: 1,8,2 Isr: 8,2 > > Topic: topicCDR Partition: 15 Leader: 2 Replicas: 2,1,3 Isr: 3,2 > > Topic:topicDEBUG PartitionCount:16 ReplicationFactor:3 Configs: > > retention.ms > > =3600000 > > Topic: topicDEBUG Partition: 0 Leader: 4 Replicas: 4,3,5 Isr: 3,4,5 > > Topic: topicDEBUG Partition: 1 Leader: 5 Replicas: 5,4,6 Isr: 4,5,6 > > Topic: topicDEBUG Partition: 2 Leader: 6 Replicas: 6,5,7 Isr: 5,6,7 > > Topic: topicDEBUG Partition: 3 Leader: 7 Replicas: 7,6,8 Isr: 6,7,8 > > Topic: topicDEBUG Partition: 4 Leader: 8 Replicas: 8,7,1 Isr: 7,8 > > Topic: topicDEBUG Partition: 5 Leader: 8 Replicas: 1,8,2 Isr: 8,2 > > Topic: topicDEBUG Partition: 6 Leader: 2 Replicas: 2,1,3 Isr: 3,2 > > Topic: topicDEBUG Partition: 7 Leader: 3 Replicas: 3,2,4 Isr: 3,4,2 > > Topic: topicDEBUG Partition: 8 Leader: 4 Replicas: 4,5,6 Isr: 4,5,6 > > Topic: topicDEBUG Partition: 9 Leader: 5 Replicas: 5,6,7 Isr: 5,6,7 > > Topic: topicDEBUG Partition: 10 Leader: 6 Replicas: 6,7,8 Isr: 6,7,8 > > Topic: topicDEBUG Partition: 11 Leader: 7 Replicas: 7,8,1 Isr: 7,8,1 > > Topic: topicDEBUG Partition: 12 Leader: 8 Replicas: 8,1,2 Isr: 8,2 > > Topic: topicDEBUG Partition: 13 Leader: 3 Replicas: 1,2,3 Isr: 3,2 > > Topic: topicDEBUG Partition: 14 Leader: 2 Replicas: 2,3,4 Isr: 3,4,2 > > Topic: topicDEBUG Partition: 15 Leader: 3 Replicas: 3,4,5 Isr: 3,4,5 > > Topic:topicTRACE PartitionCount:16 ReplicationFactor:3 Configs: > > retention.ms > > =3600000 > > Topic: topicTRACE Partition: 0 Leader: 5 Replicas: 5,8,1 Isr: 5,8,1 > > Topic: topicTRACE Partition: 1 Leader: 6 Replicas: 6,1,2 Isr: 6,1,2 > > Topic: topicTRACE Partition: 2 Leader: 7 Replicas: 7,2,3 Isr: 3,7,2 > > Topic: topicTRACE Partition: 3 Leader: 8 Replicas: 8,3,4 Isr: 3,4,8 > > Topic: topicTRACE Partition: 4 Leader: 1 Replicas: 1,4,5 Isr: 1,5,4 > > Topic: topicTRACE Partition: 5 Leader: 2 Replicas: 2,5,6 Isr: 5,6,2 > > Topic: topicTRACE Partition: 6 Leader: 3 Replicas: 3,6,7 Isr: 3,6,7 > > Topic: topicTRACE Partition: 7 Leader: 4 Replicas: 4,7,8 Isr: 4,7,8 > > Topic: topicTRACE Partition: 8 Leader: 5 Replicas: 5,1,2 Isr: 5,1,2 > > Topic: topicTRACE Partition: 9 Leader: 6 Replicas: 6,2,3 Isr: 3,6,2 > > Topic: topicTRACE Partition: 10 Leader: 7 Replicas: 7,3,4 Isr: 3,4,7 > > Topic: topicTRACE Partition: 11 Leader: 8 Replicas: 8,4,5 Isr: 4,5,8 > > > > > > Problem is one of my topic's ISR is not updating and keep failing to be > > preferred replica. more detail, broker 1 for topicDEBUG's ISR is not > > updating. > > And log of broker 1 is absolutely normal and has no error. > > > > This is expected situation? what I have to updating this? > > > > > > Thanks in advance. > > > > > > > > > > > > -- > > *Sincerely* > > *,**Bongyeon Kim* > > > > Java Developer & Engineer > > Seoul, Korea > > Mobile: +82-10-9369-1314 > > Email: bongyeon...@gmail.com > > Twitter: http://twitter.com/tigerby > > Facebook: http://facebook.com/tigerby > > Wiki: http://tigerby.com > > > -- *Sincerely* *,**Bongyeon Kim* Java Developer & Engineer Seoul, Korea Mobile: +82-10-9369-1314 Email: bongyeon...@gmail.com Twitter: http://twitter.com/tigerby Facebook: http://facebook.com/tigerby Wiki: http://tigerby.com