Hi guys,

today I have observed a very strange behavior of the auto leader rebalance
feature after I used the reassign partitions tool.
For some reason only the first two of my six brokers are now used as
leaders.

Example:
# ./kafka-topics.sh --zookeeper xxx --describe --topic Search
Topic:Search    PartitionCount:10       ReplicationFactor:3     Configs:
        Topic: Search   Partition: 0    Leader: 1       Replicas: 1,3,5 Isr: 
5,3,1
        Topic: Search   Partition: 1    Leader: 2       Replicas: 2,4,6 Isr: 
6,4,2
        Topic: Search   Partition: 2    Leader: 1       Replicas: 1,3,5 Isr: 
5,3,1
        Topic: Search   Partition: 3    Leader: 2       Replicas: 2,4,6 Isr: 
2,6,4
        Topic: Search   Partition: 4    Leader: 1       Replicas: 1,3,5 Isr: 
3,5,1
        Topic: Search   Partition: 5    Leader: 2       Replicas: 2,4,6 Isr: 
4,2,6
        Topic: Search   Partition: 6    Leader: 1       Replicas: 1,3,5 Isr: 
5,3,1
        Topic: Search   Partition: 7    Leader: 2       Replicas: 2,4,6 Isr: 
6,2,4
        Topic: Search   Partition: 8    Leader: 1       Replicas: 1,3,5 Isr: 
5,3,1
        Topic: Search   Partition: 9    Leader: 2       Replicas: 2,4,6 Isr: 
6,2,4


Prior to the partition reassignment it looked like this (for that topic,
multiple topics were updated with one partition reassignment call):
Topic:Search    PartitionCount:10       ReplicationFactor:3     Configs:
        Topic: Search   Partition: 0    Leader: 5       Replicas: 1,3,5 Isr: 
5,3,1
        Topic: Search   Partition: 1    Leader: 6       Replicas: 2,4,6 Isr: 
6,4,2
        Topic: Search   Partition: 2    Leader: 1       Replicas: 1,3,5 Isr: 
1,5,3
        Topic: Search   Partition: 3    Leader: 2       Replicas: 2,4,6 Isr: 
2,6,4
        Topic: Search   Partition: 4    Leader: 3       Replicas: 1,3,5 Isr: 
1,3,5
        Topic: Search   Partition: 5    Leader: 4       Replicas: 2,4,6 Isr: 
4,2,6
        Topic: Search   Partition: 6    Leader: 5       Replicas: 1,3,5 Isr: 
5,1,3
        Topic: Search   Partition: 7    Leader: 6       Replicas: 2,4,6 Isr: 
6,2,4
        Topic: Search   Partition: 8    Leader: 1       Replicas: 1,3,5 Isr: 
5,1,3
        Topic: Search   Partition: 9    Leader: 2       Replicas: 2,4,6 Isr: 
6,2,4

And I would expect to see a similar behavior now
But even if I manually shut down broker 1 and thus force a new leader
election the situation only changes temporarily:
Topic:Search    PartitionCount:10       ReplicationFactor:3     Configs:
        Topic: Search   Partition: 0    Leader: 5       Replicas: 1,3,5 Isr: 5,3
        Topic: Search   Partition: 1    Leader: 2       Replicas: 2,4,6 Isr: 
6,4,2
        Topic: Search   Partition: 2    Leader: 5       Replicas: 1,3,5 Isr: 5,3
        Topic: Search   Partition: 3    Leader: 2       Replicas: 2,4,6 Isr: 
2,6,4
        Topic: Search   Partition: 4    Leader: 3       Replicas: 1,3,5 Isr: 3,5
        Topic: Search   Partition: 5    Leader: 2       Replicas: 2,4,6 Isr: 
4,2,6
        Topic: Search   Partition: 6    Leader: 5       Replicas: 1,3,5 Isr: 5,3
        Topic: Search   Partition: 7    Leader: 2       Replicas: 2,4,6 Isr: 
6,2,4
        Topic: Search   Partition: 8    Leader: 5       Replicas: 1,3,5 Isr: 5,3
        Topic: Search   Partition: 9    Leader: 2       Replicas: 2,4,6 Isr: 
6,2,4

As soon as I then start broker 1 again, I see the same picture as in the
beginning (only broker 1 and 2 being leaders for any of my partitions).
Even if I wait an hour, the picture still looks the same.
If I stop both, broker 1 and broker 2, I see broker 5 and 6 getting most
of the leader roles in the cluster (together they are then the leaders for
51 of my 70 partitions), so even then it looks bad. Once I start broker 1
and 2 again they will take over the leader roles for all partitions again.

Any ideas?

Configuration excerpt:
auto.leader.rebalance.enable=true
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=10
unclean.leader.election.enable=false
default.replication.factor=3
num.partitions=10
...

I am using Kafka 0.8.2.1 on RHEL6.6 boxes with 7 topics with 10 partitions
each, 6 brokers and 3 zookeeper servers.

Greetings
Valentin

Reply via email to