Hi Gwen, sure, the following commands were executed: ./kafka-reassign-partitions.sh --zookeeper XXX --reassignment-json-file ~/partition_redist.json --execute ./kafka-reassign-partitions.sh --zookeeper XXX --reassignment-json-file ~/partition_redist.json --verify
The contents of partition_redist.json are: { "partitions": [ { "topic": "T1", "partition": 0, "replicas": [1,3,5] }, { "topic": "T1", "partition": 1, "replicas": [2,4,6] }, { "topic": "T1", "partition": 2, "replicas": [1,3,5] }, { "topic": "T1", "partition": 3, "replicas": [2,4,6] }, { "topic": "T1", "partition": 4, "replicas": [1,3,5] }, { "topic": "T1", "partition": 5, "replicas": [2,4,6] }, { "topic": "T1", "partition": 6, "replicas": [1,3,5] }, { "topic": "T1", "partition": 7, "replicas": [2,4,6] }, { "topic": "T1", "partition": 8, "replicas": [1,3,5] }, { "topic": "T1", "partition": 9, "replicas": [2,4,6] }, { "topic": "T2", "partition": 0, "replicas": [1,3,5] }, { "topic": "T2", "partition": 1, "replicas": [2,4,6] }, { "topic": "T2", "partition": 2, "replicas": [1,3,5] }, { "topic": "T2", "partition": 3, "replicas": [2,4,6] }, { "topic": "T2", "partition": 4, "replicas": [1,3,5] }, { "topic": "T2", "partition": 5, "replicas": [2,4,6] }, { "topic": "T2", "partition": 6, "replicas": [1,3,5] }, { "topic": "T2", "partition": 7, "replicas": [2,4,6] }, { "topic": "T2", "partition": 8, "replicas": [1,3,5] }, { "topic": "T2", "partition": 9, "replicas": [2,4,6] }, { "topic": "T3", "partition": 0, "replicas": [1,3,5] }, { "topic": "T3", "partition": 1, "replicas": [2,4,6] }, { "topic": "T3", "partition": 2, "replicas": [1,3,5] }, { "topic": "T3", "partition": 3, "replicas": [2,4,6] }, { "topic": "T3", "partition": 4, "replicas": [1,3,5] }, { "topic": "T3", "partition": 5, "replicas": [2,4,6] }, { "topic": "T3", "partition": 6, "replicas": [1,3,5] }, { "topic": "T3", "partition": 7, "replicas": [2,4,6] }, { "topic": "T3", "partition": 8, "replicas": [1,3,5] }, { "topic": "T3", "partition": 9, "replicas": [2,4,6] }, { "topic": "T4", "partition": 0, "replicas": [1,3,5] }, { "topic": "T4", "partition": 1, "replicas": [2,4,6] }, { "topic": "T4", "partition": 2, "replicas": [1,3,5] }, { "topic": "T4", "partition": 3, "replicas": [2,4,6] }, { "topic": "T4", "partition": 4, "replicas": [1,3,5] }, { "topic": "T4", "partition": 5, "replicas": [2,4,6] }, { "topic": "T4", "partition": 6, "replicas": [1,3,5] }, { "topic": "T4", "partition": 7, "replicas": [2,4,6] }, { "topic": "T4", "partition": 8, "replicas": [1,3,5] }, { "topic": "T4", "partition": 9, "replicas": [2,4,6] }, { "topic": "T5", "partition": 0, "replicas": [1,3,5] }, { "topic": "T5", "partition": 1, "replicas": [2,4,6] }, { "topic": "T5", "partition": 2, "replicas": [1,3,5] }, { "topic": "T5", "partition": 3, "replicas": [2,4,6] }, { "topic": "T5", "partition": 4, "replicas": [1,3,5] }, { "topic": "T5", "partition": 5, "replicas": [2,4,6] }, { "topic": "T5", "partition": 6, "replicas": [1,3,5] }, { "topic": "T5", "partition": 7, "replicas": [2,4,6] }, { "topic": "T5", "partition": 8, "replicas": [1,3,5] }, { "topic": "T5", "partition": 9, "replicas": [2,4,6] }, { "topic": "T6", "partition": 0, "replicas": [1,3,5] }, { "topic": "T6", "partition": 1, "replicas": [2,4,6] }, { "topic": "T6", "partition": 2, "replicas": [1,3,5] }, { "topic": "T6", "partition": 3, "replicas": [2,4,6] }, { "topic": "T6", "partition": 4, "replicas": [1,3,5] }, { "topic": "T6", "partition": 5, "replicas": [2,4,6] }, { "topic": "T6", "partition": 6, "replicas": [1,3,5] }, { "topic": "T6", "partition": 7, "replicas": [2,4,6] }, { "topic": "T6", "partition": 8, "replicas": [1,3,5] }, { "topic": "T6", "partition": 9, "replicas": [2,4,6] }, { "topic": "Search", "partition": 0, "replicas": [1,3,5] }, { "topic": "Search", "partition": 1, "replicas": [2,4,6] }, { "topic": "Search", "partition": 2, "replicas": [1,3,5] }, { "topic": "Search", "partition": 3, "replicas": [2,4,6] }, { "topic": "Search", "partition": 4, "replicas": [1,3,5] }, { "topic": "Search", "partition": 5, "replicas": [2,4,6] }, { "topic": "Search", "partition": 6, "replicas": [1,3,5] }, { "topic": "Search", "partition": 7, "replicas": [2,4,6] }, { "topic": "Search", "partition": 8, "replicas": [1,3,5] }, { "topic": "Search", "partition": 9, "replicas": [2,4,6] } ], "version": 1 } Greetings Valentin -----Original Message----- From: Gwen Shapira <gshap...@cloudera.com> Reply-To: "users@kafka.apache.org" <users@kafka.apache.org> Date: Montag, 15. Juni 2015 18:31 To: "users@kafka.apache.org" <users@kafka.apache.org> Subject: Re: Broken auto leader rebalance after using reassign partitions tool Can you share the command you ran for partition reassignment? (and the JSON) On Mon, Jun 15, 2015 at 8:41 AM, Valentin <kafka-9999...@sblk.de> wrote: Hi guys, today I have observed a very strange behavior of the auto leader rebalance feature after I used the reassign partitions tool. For some reason only the first two of my six brokers are now used as leaders. Example: # ./kafka-topics.sh --zookeeper xxx --describe --topic Search Topic:Search PartitionCount:10 ReplicationFactor:3 Configs: Topic: Search Partition: 0 Leader: 1 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 1 Leader: 2 Replicas: 2,4,6 Isr: 6,4,2 Topic: Search Partition: 2 Leader: 1 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 3 Leader: 2 Replicas: 2,4,6 Isr: 2,6,4 Topic: Search Partition: 4 Leader: 1 Replicas: 1,3,5 Isr: 3,5,1 Topic: Search Partition: 5 Leader: 2 Replicas: 2,4,6 Isr: 4,2,6 Topic: Search Partition: 6 Leader: 1 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 7 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 Topic: Search Partition: 8 Leader: 1 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 9 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 Prior to the partition reassignment it looked like this (for that topic, multiple topics were updated with one partition reassignment call): Topic:Search PartitionCount:10 ReplicationFactor:3 Configs: Topic: Search Partition: 0 Leader: 5 Replicas: 1,3,5 Isr: 5,3,1 Topic: Search Partition: 1 Leader: 6 Replicas: 2,4,6 Isr: 6,4,2 Topic: Search Partition: 2 Leader: 1 Replicas: 1,3,5 Isr: 1,5,3 Topic: Search Partition: 3 Leader: 2 Replicas: 2,4,6 Isr: 2,6,4 Topic: Search Partition: 4 Leader: 3 Replicas: 1,3,5 Isr: 1,3,5 Topic: Search Partition: 5 Leader: 4 Replicas: 2,4,6 Isr: 4,2,6 Topic: Search Partition: 6 Leader: 5 Replicas: 1,3,5 Isr: 5,1,3 Topic: Search Partition: 7 Leader: 6 Replicas: 2,4,6 Isr: 6,2,4 Topic: Search Partition: 8 Leader: 1 Replicas: 1,3,5 Isr: 5,1,3 Topic: Search Partition: 9 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 And I would expect to see a similar behavior now But even if I manually shut down broker 1 and thus force a new leader election the situation only changes temporarily: Topic:Search PartitionCount:10 ReplicationFactor:3 Configs: Topic: Search Partition: 0 Leader: 5 Replicas: 1,3,5 Isr: 5,3 Topic: Search Partition: 1 Leader: 2 Replicas: 2,4,6 Isr: 6,4,2 Topic: Search Partition: 2 Leader: 5 Replicas: 1,3,5 Isr: 5,3 Topic: Search Partition: 3 Leader: 2 Replicas: 2,4,6 Isr: 2,6,4 Topic: Search Partition: 4 Leader: 3 Replicas: 1,3,5 Isr: 3,5 Topic: Search Partition: 5 Leader: 2 Replicas: 2,4,6 Isr: 4,2,6 Topic: Search Partition: 6 Leader: 5 Replicas: 1,3,5 Isr: 5,3 Topic: Search Partition: 7 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 Topic: Search Partition: 8 Leader: 5 Replicas: 1,3,5 Isr: 5,3 Topic: Search Partition: 9 Leader: 2 Replicas: 2,4,6 Isr: 6,2,4 As soon as I then start broker 1 again, I see the same picture as in the beginning (only broker 1 and 2 being leaders for any of my partitions). Even if I wait an hour, the picture still looks the same. If I stop both, broker 1 and broker 2, I see broker 5 and 6 getting most of the leader roles in the cluster (together they are then the leaders for 51 of my 70 partitions), so even then it looks bad. Once I start broker 1 and 2 again they will take over the leader roles for all partitions again. Any ideas? Configuration excerpt: auto.leader.rebalance.enable=true leader.imbalance.check.interval.seconds=300 leader.imbalance.per.broker.percentage=10 unclean.leader.election.enable=false default.replication.factor=3 num.partitions=10 ... I am using Kafka 0.8.2.1 on RHEL6.6 boxes with 7 topics with 10 partitions each, 6 brokers and 3 zookeeper servers. Greetings Valentin