Hello Wes, The document here is a bit misleading indeed:
http://kafka.apache.org/documentation.html#brokerconfigs In Kafka a partition has a replica list {A,B,C..} and the first replica would be the leader of the partition. When it is not the case, for example since A is down B becomes the leader, the replica list will still be {A,B,C..} but A's status will be "offline replica" and B as the new leader; later on even when A resumes it will still be a follower, and hence this is a case of "imbalance". The "leader.imbalance.per.broker.percentage" kicks in when this percentage of this imbalance cases are higher than the threshold. In your case those imbalances cases will be more than 10%, but since those two brokers are not back the rebalance logic, although triggered, will not be able to do anything (you may check the controller logs for entries like ""Starting preferred replica leader election for ..." to verify). When you brought those two brokers base online, I think the auto leader rebalance will execute to move the leaders back to those brokers. Guozhang On Tue, Nov 18, 2014 at 8:48 AM, Wes Chow <w...@chartbeat.com> wrote: > I'm trying to understand the config options for auto-rebalancing. This is > what we have in /etc/kafka/server.properties for all the nodes: > > auto.leader.rebalance.enable=true > leader.imbalance.per.broker.percentage=10 > leader.imbalance.check.interval.seconds=300 > > We have 10 nodes for this topic which has 512 partitions. They were evenly > balanced before I started my experiment. I shut down two of the nodes, and > the number of leaders per node is now: > > 75 10 > 68 3 > 57 4 > 67 5 > 57 6 > 68 7 > 63 8 > 57 9 > > Where the first column is # of leaders, and the second column is node #. > You can see that nodes 1 and 2 have no leaders, since they're down. It's > been about half an hour since I did this and the balancing hasn't changed. > > The documentation on the config option is very ambiguous. My > interpretation is that it says if any particular node has 10% more leaders > then auto-rebalance kicks in. If that means 10% more than the average, then > node #10 has 75 partitioners, and the average is 64, so that's a 17% > difference. > > So I think I'm misunderstanding either what auto-rebalance is supposed to > do or the condition that's supposed to trigger it. Any clues? > > Thanks, > Wes > > -- -- Guozhang