I’ve had this happen a few times when a partition reassignment was underway and 
one of the brokers that is a destination for the reassignment became unhealthy. 
This essentially stalls the reassignment indefinitely. The partition with 10 
instead of 5 replicas was undergoing a reassignment where all the replicas were 
being moved to new brokers. When that occurs, there will necessarily be 2x the 
number of desired replicas for a short time before the old replicas are removed.

The solution is to identify the faulty broker and make it healthy again. In 
this case, it looks like broker 59 is at fault since it is the only one not in 
the ISR set on the under-replicated partitions.

-- Peter

> On Feb 12, 2020, at 8:10 PM, SenthilKumar K <senthilec...@gmail.com> wrote:
> 
> We are also facing the similar issue in our kafka cluster.
> 
> Kafka Version: 2.2.0
> RF: 5
> 
> PartitionLatest OffsetLeaderReplicasIn Sync ReplicasPreferred Leader?Under
> Replicated?
> 0 121 <http://198.18.134.22:9000/clusters/IAD/brokers/121> (121,50,51,52,53)
> (52,121,53,50,51) true false
> 1 122 <http://198.18.134.22:9000/clusters/IAD/brokers/122> (122,51,52,53,54)
> (52,53,54,51,122) true false
> 2 123 <http://198.18.134.22:9000/clusters/IAD/brokers/123> (123,52,53,54,55)
> (52,53,54,123,55) true false
> 3 125 <http://198.18.134.22:9000/clusters/IAD/brokers/125> (125,53,54,55,56)
> (56,125,53,54,55) true false
> 4 127 <http://198.18.134.22:9000/clusters/IAD/brokers/127> (127,54,55,56,57)
> (56,57,54,127,55) true false
> 5 56 <http://198.18.134.22:9000/clusters/IAD/brokers/56>
> (56,93,57,102,92,96,128,59,95,55) (56,93,57,102,92,96,128,95,55) true true
> 6 56 <http://198.18.134.22:9000/clusters/IAD/brokers/56>
> (56,93,57,60,97,96,129,59,103,95) (56,93,57,60,97,96,129,103,95) true true
> 7 57 <http://198.18.134.22:9000/clusters/IAD/brokers/57>
> (57,60,97,96,59,130,95,104,62,100) (57,60,97,96,130,95,104,62,100) true true
> 8 101 <http://198.18.134.22:9000/clusters/IAD/brokers/101>
> (101,60,65,97,96,105,59,131,62,100) (101,60,65,97,96,105,131,62) true true
> 
> Have a look at the partition ^ 5, total replica for partition 5 is 10 but
> the replica set to 5. Under-replicated % is 26 for the same topic. And the
> partition reassignment is getting stuck for more than 24 hours.
> 
> Under-replicated % 26
> 
> --Senthil
>> On Thu, Feb 13, 2020 at 6:32 AM Madhuri Khattar (mkhattar)
>> <mkhat...@cisco.com.invalid> wrote:
>> Hi Today there was some network glich which caused some of my topics to
>> have replicas more than the replication-factor:
>> $ ./kafka-topics --zookeeper sjc-ddzk-01 --topic foodee --describe
>> Topic:foodee     PartitionCount:6               ReplicationFactor:2
>> Configs:retention.ms=345600000,segment.ms
>> =345600000,max.message.bytes=2000000
>>              Topic: foodee: 0               Leader: 0
>> Replicas: 0,1       Isr: 0,1
>>              Topic: foodee: 1               Leader: 1
>> Replicas: 1,2       Isr: 1,2
>>              Topic: foodee: 2               Leader: 1
>> Replicas: 2,3,1   Isr: 1,2
>>              Topic: foodee: 3               Leader: 1
>> Replicas: 3,4,1   Isr: 1,4
>>              Topic: foodee: 4               Leader: 0
>> Replicas: 0,4       Isr: 0,4
>>              Topic: foodee: 5               Leader: 0
>> Replicas: 0,2       Isr: 0,2
>> This has happened for many topics.
>> As a result I am having lot of under replicated partitions and partitions
>> and leaders are not evenly distributed:
>> Reassignment is also getting stuck.

Reply via email to