Hi guys, today I have observed a very strange behavior of the auto leader rebalance feature after I used the reassign partitions tool. For some reason only the first two of my six brokers are now used as leaders.
Example: # ./kafka-topics.sh --zookeeper xxx --describe --topic Search Topic:Search PartitionCount:10 ReplicationFactor:3 Configs: Topic: Search Partition: 0 Leader: 1 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 1 Leader: 2 Replicas: 2,4,6 Isr: 6,4,2 Topic: Search Partition: 2 Leader: 1 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 3 Leader: 2 Replicas: 2,4,6 Isr: 2,6,4 Topic: Search Partition: 4 Leader: 1 Replicas: 1,3,5 Isr: 3,5,1 Topic: Search Partition: 5 Leader: 2 Replicas: 2,4,6 Isr: 4,2,6 Topic: Search Partition: 6 Leader: 1 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 7 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 Topic: Search Partition: 8 Leader: 1 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 9 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 Prior to the partition reassignment it looked like this (for that topic, multiple topics were updated with one partition reassignment call): Topic:Search PartitionCount:10 ReplicationFactor:3 Configs: Topic: Search Partition: 0 Leader: 5 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 1 Leader: 6 Replicas: 2,4,6 Isr: 6,4,2 Topic: Search Partition: 2 Leader: 1 Replicas: 1,3,5 Isr: 1,5,3 Topic: Search Partition: 3 Leader: 2 Replicas: 2,4,6 Isr: 2,6,4 Topic: Search Partition: 4 Leader: 3 Replicas: 1,3,5 Isr: 1,3,5 Topic: Search Partition: 5 Leader: 4 Replicas: 2,4,6 Isr: 4,2,6 Topic: Search Partition: 6 Leader: 5 Replicas: 1,3,5 Isr: 5,1,3 Topic: Search Partition: 7 Leader: 6 Replicas: 2,4,6 Isr: 6,2,4 Topic: Search Partition: 8 Leader: 1 Replicas: 1,3,5 Isr: 5,1,3 Topic: Search Partition: 9 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 And I would expect to see a similar behavior now But even if I manually shut down broker 1 and thus force a new leader election the situation only changes temporarily: Topic:Search PartitionCount:10 ReplicationFactor:3 Configs: Topic: Search Partition: 0 Leader: 5 Replicas: 1,3,5 Isr: 5,3 Topic: Search Partition: 1 Leader: 2 Replicas: 2,4,6 Isr: 6,4,2 Topic: Search Partition: 2 Leader: 5 Replicas: 1,3,5 Isr: 5,3 Topic: Search Partition: 3 Leader: 2 Replicas: 2,4,6 Isr: 2,6,4 Topic: Search Partition: 4 Leader: 3 Replicas: 1,3,5 Isr: 3,5 Topic: Search Partition: 5 Leader: 2 Replicas: 2,4,6 Isr: 4,2,6 Topic: Search Partition: 6 Leader: 5 Replicas: 1,3,5 Isr: 5,3 Topic: Search Partition: 7 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 Topic: Search Partition: 8 Leader: 5 Replicas: 1,3,5 Isr: 5,3 Topic: Search Partition: 9 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 As soon as I then start broker 1 again, I see the same picture as in the beginning (only broker 1 and 2 being leaders for any of my partitions). Even if I wait an hour, the picture still looks the same. If I stop both, broker 1 and broker 2, I see broker 5 and 6 getting most of the leader roles in the cluster (together they are then the leaders for 51 of my 70 partitions), so even then it looks bad. Once I start broker 1 and 2 again they will take over the leader roles for all partitions again. Any ideas? Configuration excerpt: auto.leader.rebalance.enable=true leader.imbalance.check.interval.seconds=300 leader.imbalance.per.broker.percentage=10 unclean.leader.election.enable=false default.replication.factor=3 num.partitions=10 ... I am using Kafka 0.8.2.1 on RHEL6.6 boxes with 7 topics with 10 partitions each, 6 brokers and 3 zookeeper servers. Greetings Valentin