Hi Gwen,

sure, the following commands were executed:
./kafka-reassign-partitions.sh --zookeeper XXX --reassignment-json-file
~/partition_redist.json --execute
./kafka-reassign-partitions.sh --zookeeper XXX --reassignment-json-file
~/partition_redist.json --verify

The contents of partition_redist.json are:
{
  "partitions":
  [
    { "topic": "T1", "partition": 0, "replicas": [1,3,5] },
    { "topic": "T1", "partition": 1, "replicas": [2,4,6] },
    { "topic": "T1", "partition": 2, "replicas": [1,3,5] },
    { "topic": "T1", "partition": 3, "replicas": [2,4,6] },
    { "topic": "T1", "partition": 4, "replicas": [1,3,5] },
    { "topic": "T1", "partition": 5, "replicas": [2,4,6] },
    { "topic": "T1", "partition": 6, "replicas": [1,3,5] },
    { "topic": "T1", "partition": 7, "replicas": [2,4,6] },
    { "topic": "T1", "partition": 8, "replicas": [1,3,5] },
    { "topic": "T1", "partition": 9, "replicas": [2,4,6] },

    { "topic": "T2", "partition": 0, "replicas": [1,3,5] },
    { "topic": "T2", "partition": 1, "replicas": [2,4,6] },
    { "topic": "T2", "partition": 2, "replicas": [1,3,5] },
    { "topic": "T2", "partition": 3, "replicas": [2,4,6] },
    { "topic": "T2", "partition": 4, "replicas": [1,3,5] },
    { "topic": "T2", "partition": 5, "replicas": [2,4,6] },
    { "topic": "T2", "partition": 6, "replicas": [1,3,5] },
    { "topic": "T2", "partition": 7, "replicas": [2,4,6] },
    { "topic": "T2", "partition": 8, "replicas": [1,3,5] },
    { "topic": "T2", "partition": 9, "replicas": [2,4,6] },

    { "topic": "T3", "partition": 0, "replicas": [1,3,5] },
    { "topic": "T3", "partition": 1, "replicas": [2,4,6] },
    { "topic": "T3", "partition": 2, "replicas": [1,3,5] },
    { "topic": "T3", "partition": 3, "replicas": [2,4,6] },
    { "topic": "T3", "partition": 4, "replicas": [1,3,5] },
    { "topic": "T3", "partition": 5, "replicas": [2,4,6] },
    { "topic": "T3", "partition": 6, "replicas": [1,3,5] },
    { "topic": "T3", "partition": 7, "replicas": [2,4,6] },
    { "topic": "T3", "partition": 8, "replicas": [1,3,5] },
    { "topic": "T3", "partition": 9, "replicas": [2,4,6] },

    { "topic": "T4", "partition": 0, "replicas": [1,3,5] },
    { "topic": "T4", "partition": 1, "replicas": [2,4,6] },
    { "topic": "T4", "partition": 2, "replicas": [1,3,5] },
    { "topic": "T4", "partition": 3, "replicas": [2,4,6] },
    { "topic": "T4", "partition": 4, "replicas": [1,3,5] },
    { "topic": "T4", "partition": 5, "replicas": [2,4,6] },
    { "topic": "T4", "partition": 6, "replicas": [1,3,5] },
    { "topic": "T4", "partition": 7, "replicas": [2,4,6] },
    { "topic": "T4", "partition": 8, "replicas": [1,3,5] },
    { "topic": "T4", "partition": 9, "replicas": [2,4,6] },

    { "topic": "T5", "partition": 0, "replicas": [1,3,5] },
    { "topic": "T5", "partition": 1, "replicas": [2,4,6] },
    { "topic": "T5", "partition": 2, "replicas": [1,3,5] },
    { "topic": "T5", "partition": 3, "replicas": [2,4,6] },
    { "topic": "T5", "partition": 4, "replicas": [1,3,5] },
    { "topic": "T5", "partition": 5, "replicas": [2,4,6] },
    { "topic": "T5", "partition": 6, "replicas": [1,3,5] },
    { "topic": "T5", "partition": 7, "replicas": [2,4,6] },
    { "topic": "T5", "partition": 8, "replicas": [1,3,5] },
    { "topic": "T5", "partition": 9, "replicas": [2,4,6] },

    { "topic": "T6", "partition": 0, "replicas": [1,3,5] },
    { "topic": "T6", "partition": 1, "replicas": [2,4,6] },
    { "topic": "T6", "partition": 2, "replicas": [1,3,5] },
    { "topic": "T6", "partition": 3, "replicas": [2,4,6] },
    { "topic": "T6", "partition": 4, "replicas": [1,3,5] },
    { "topic": "T6", "partition": 5, "replicas": [2,4,6] },
    { "topic": "T6", "partition": 6, "replicas": [1,3,5] },
    { "topic": "T6", "partition": 7, "replicas": [2,4,6] },
    { "topic": "T6", "partition": 8, "replicas": [1,3,5] },
    { "topic": "T6", "partition": 9, "replicas": [2,4,6] },

    { "topic": "Search", "partition": 0, "replicas": [1,3,5] },
    { "topic": "Search", "partition": 1, "replicas": [2,4,6] },
    { "topic": "Search", "partition": 2, "replicas": [1,3,5] },
    { "topic": "Search", "partition": 3, "replicas": [2,4,6] },
    { "topic": "Search", "partition": 4, "replicas": [1,3,5] },
    { "topic": "Search", "partition": 5, "replicas": [2,4,6] },
    { "topic": "Search", "partition": 6, "replicas": [1,3,5] },
    { "topic": "Search", "partition": 7, "replicas": [2,4,6] },
    { "topic": "Search", "partition": 8, "replicas": [1,3,5] },
    { "topic": "Search", "partition": 9, "replicas": [2,4,6] }
  ],
  "version": 1
}

Greetings
Valentin

-----Original Message-----
From: Gwen Shapira <gshap...@cloudera.com>
Reply-To: "users@kafka.apache.org" <users@kafka.apache.org>
Date: Montag, 15. Juni 2015 18:31
To: "users@kafka.apache.org" <users@kafka.apache.org>
Subject: Re: Broken auto leader rebalance after using reassign partitions
tool

Can you share the command you ran for partition reassignment? (and the
JSON)


On Mon, Jun 15, 2015 at 8:41 AM, Valentin <kafka-9999...@sblk.de> wrote:

Hi guys,

today I have observed a very strange behavior of the auto leader rebalance
feature after I used the reassign partitions tool.
For some reason only the first two of my six brokers are now used as
leaders.

Example:
# ./kafka-topics.sh --zookeeper xxx --describe --topic Search
Topic:Search    PartitionCount:10       ReplicationFactor:3     Configs:
         Topic: Search   Partition: 0    Leader: 1       Replicas: 1,3,5
Isr: 5,3,1
         Topic: Search   Partition: 1    Leader: 2       Replicas: 2,4,6
Isr: 6,4,2
         Topic: Search   Partition: 2    Leader: 1       Replicas: 1,3,5
Isr: 5,3,1
         Topic: Search   Partition: 3    Leader: 2       Replicas: 2,4,6
Isr: 2,6,4
         Topic: Search   Partition: 4    Leader: 1       Replicas: 1,3,5
Isr: 3,5,1
         Topic: Search   Partition: 5    Leader: 2       Replicas: 2,4,6
Isr: 4,2,6
         Topic: Search   Partition: 6    Leader: 1       Replicas: 1,3,5
Isr: 5,3,1
         Topic: Search   Partition: 7    Leader: 2       Replicas: 2,4,6
Isr: 6,2,4
         Topic: Search   Partition: 8    Leader: 1       Replicas: 1,3,5
Isr: 5,3,1
         Topic: Search   Partition: 9    Leader: 2       Replicas: 2,4,6
Isr: 6,2,4


Prior to the partition reassignment it looked like this (for that topic,
multiple topics were updated with one partition reassignment call):
Topic:Search    PartitionCount:10       ReplicationFactor:3     Configs:
         Topic: Search   Partition: 0    Leader: 5       Replicas: 1,3,5
Isr: 5,3,1
         Topic: Search   Partition: 1    Leader: 6       Replicas: 2,4,6
Isr: 6,4,2
         Topic: Search   Partition: 2    Leader: 1       Replicas: 1,3,5
Isr: 1,5,3
         Topic: Search   Partition: 3    Leader: 2       Replicas: 2,4,6
Isr: 2,6,4
         Topic: Search   Partition: 4    Leader: 3       Replicas: 1,3,5
Isr: 1,3,5
         Topic: Search   Partition: 5    Leader: 4       Replicas: 2,4,6
Isr: 4,2,6
         Topic: Search   Partition: 6    Leader: 5       Replicas: 1,3,5
Isr: 5,1,3
         Topic: Search   Partition: 7    Leader: 6       Replicas: 2,4,6
Isr: 6,2,4
         Topic: Search   Partition: 8    Leader: 1       Replicas: 1,3,5
Isr: 5,1,3
         Topic: Search   Partition: 9    Leader: 2       Replicas: 2,4,6
Isr: 6,2,4

And I would expect to see a similar behavior now
But even if I manually shut down broker 1 and thus force a new leader
election the situation only changes temporarily:
Topic:Search    PartitionCount:10       ReplicationFactor:3     Configs:
         Topic: Search   Partition: 0    Leader: 5       Replicas: 1,3,5
Isr: 5,3
         Topic: Search   Partition: 1    Leader: 2       Replicas: 2,4,6
Isr: 6,4,2
         Topic: Search   Partition: 2    Leader: 5       Replicas: 1,3,5
Isr: 5,3
         Topic: Search   Partition: 3    Leader: 2       Replicas: 2,4,6
Isr: 2,6,4
         Topic: Search   Partition: 4    Leader: 3       Replicas: 1,3,5
Isr: 3,5
         Topic: Search   Partition: 5    Leader: 2       Replicas: 2,4,6
Isr: 4,2,6
         Topic: Search   Partition: 6    Leader: 5       Replicas: 1,3,5
Isr: 5,3
         Topic: Search   Partition: 7    Leader: 2       Replicas: 2,4,6
Isr: 6,2,4
         Topic: Search   Partition: 8    Leader: 5       Replicas: 1,3,5
Isr: 5,3
         Topic: Search   Partition: 9    Leader: 2       Replicas: 2,4,6
Isr: 6,2,4

As soon as I then start broker 1 again, I see the same picture as in the
beginning (only broker 1 and 2 being leaders for any of my partitions).
Even if I wait an hour, the picture still looks the same.
If I stop both, broker 1 and broker 2, I see broker 5 and 6 getting most
of the leader roles in the cluster (together they are then the leaders for
51 of my 70 partitions), so even then it looks bad. Once I start broker 1
and 2 again they will take over the leader roles for all partitions again.

Any ideas?

Configuration excerpt:
auto.leader.rebalance.enable=true
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=10
unclean.leader.election.enable=false
default.replication.factor=3
num.partitions=10
...

I am using Kafka 0.8.2.1 on RHEL6.6 boxes with 7 topics with 10 partitions
each, 6 brokers and 3 zookeeper servers.

Greetings
Valentin


Reply via email to