Re: Replicas more than replication-factor

Peter Bukowinski Wed, 12 Feb 2020 20:59:55 -0800

In cases like this where situation isn’t self-repairing, I stop the at-fault 
broker and delete the topic-partition directory/directories from the filesystem 
before starting the broker again. Are you using local storage on your brokers?


--
Peter

> On Feb 12, 2020, at 8:33 PM, Madhuri Khattar (mkhattar) 
> <mkhat...@cisco.com.invalid> wrote:
> 
> In my case I figured it was broker 3 and rebooted it after deleting 
> /admin/reassign-partitions since it was stuck there. However I am not having 
> any luck.
> I still see 400+ underreplicated partitions. I replaced broker 3 as well but 
> still there is no change after almost 7-8 hours now.
> 
> 
> 
> Madhuri Khattar
> SR ENGINEER.IT ENGINEERING
> mkhat...@cisco.com
> Tel: +1 408 525 5989
> 
> Cisco Systems, Inc.
> 400 East Tasman Drive
> SAN JOSE
> 95134
> United States
> cisco.com
> 
> Think before you print.
> This email may contain confidential and privileged material for the sole use 
> of the intended recipient. Any review, use, distribution or disclosure by 
> others is strictly prohibited. If you are not the intended recipient (or 
> authorized to receive for the recipient), please contact the sender by reply 
> email and delete all copies of this message.
> http://www.cisco.com/c/en/us/about/legal/terms-sale-software-license-agreement/company-registration-information.html
> -----Original Message-----
> From: Peter Bukowinski <pmb...@gmail.com> 
> Sent: Wednesday, February 12, 2020 8:27 PM
> To: users@kafka.apache.org
> Cc: senthilec...@apache.org
> Subject: Re: Replicas more than replication-factor
> 
> I’ve had this happen a few times when a partition reassignment was underway 
> and one of the brokers that is a destination for the reassignment became 
> unhealthy. This essentially stalls the reassignment indefinitely. The 
> partition with 10 instead of 5 replicas was undergoing a reassignment where 
> all the replicas were being moved to new brokers. When that occurs, there 
> will necessarily be 2x the number of desired replicas for a short time before 
> the old replicas are removed.
> 
> The solution is to identify the faulty broker and make it healthy again. In 
> this case, it looks like broker 59 is at fault since it is the only one not 
> in the ISR set on the under-replicated partitions.
> 
> -- Peter
> 
>> On Feb 12, 2020, at 8:10 PM, SenthilKumar K <senthilec...@gmail.com> wrote:
>> 
>> We are also facing the similar issue in our kafka cluster.
>> 
>> Kafka Version: 2.2.0
>> RF: 5
>> 
>> PartitionLatest OffsetLeaderReplicasIn Sync ReplicasPreferred 
>> Leader?Under Replicated?
>> 0 121 <http://198.18.134.22:9000/clusters/IAD/brokers/121> 
>> (121,50,51,52,53)
>> (52,121,53,50,51) true false
>> 1 122 <http://198.18.134.22:9000/clusters/IAD/brokers/122> 
>> (122,51,52,53,54)
>> (52,53,54,51,122) true false
>> 2 123 <http://198.18.134.22:9000/clusters/IAD/brokers/123> 
>> (123,52,53,54,55)
>> (52,53,54,123,55) true false
>> 3 125 <http://198.18.134.22:9000/clusters/IAD/brokers/125> 
>> (125,53,54,55,56)
>> (56,125,53,54,55) true false
>> 4 127 <http://198.18.134.22:9000/clusters/IAD/brokers/127> 
>> (127,54,55,56,57)
>> (56,57,54,127,55) true false
>> 5 56 <http://198.18.134.22:9000/clusters/IAD/brokers/56>
>> (56,93,57,102,92,96,128,59,95,55) (56,93,57,102,92,96,128,95,55) true 
>> true
>> 6 56 <http://198.18.134.22:9000/clusters/IAD/brokers/56>
>> (56,93,57,60,97,96,129,59,103,95) (56,93,57,60,97,96,129,103,95) true 
>> true
>> 7 57 <http://198.18.134.22:9000/clusters/IAD/brokers/57>
>> (57,60,97,96,59,130,95,104,62,100) (57,60,97,96,130,95,104,62,100) 
>> true true
>> 8 101 <http://198.18.134.22:9000/clusters/IAD/brokers/101>
>> (101,60,65,97,96,105,59,131,62,100) (101,60,65,97,96,105,131,62) true 
>> true
>> 
>> Have a look at the partition ^ 5, total replica for partition 5 is 10 
>> but the replica set to 5. Under-replicated % is 26 for the same topic. 
>> And the partition reassignment is getting stuck for more than 24 hours.
>> 
>> Under-replicated % 26
>> 
>> --Senthil
>>> On Thu, Feb 13, 2020 at 6:32 AM Madhuri Khattar (mkhattar) 
>>> <mkhat...@cisco.com.invalid> wrote:
>>> Hi Today there was some network glich which caused some of my topics 
>>> to have replicas more than the replication-factor:
>>> $ ./kafka-topics --zookeeper sjc-ddzk-01 --topic foodee --describe
>>> Topic:foodee     PartitionCount:6               ReplicationFactor:2
>>> Configs:retention.ms=345600000,segment.ms
>>> =345600000,max.message.bytes=2000000
>>>             Topic: foodee: 0               Leader: 0
>>> Replicas: 0,1       Isr: 0,1
>>>             Topic: foodee: 1               Leader: 1
>>> Replicas: 1,2       Isr: 1,2
>>>             Topic: foodee: 2               Leader: 1
>>> Replicas: 2,3,1   Isr: 1,2
>>>             Topic: foodee: 3               Leader: 1
>>> Replicas: 3,4,1   Isr: 1,4
>>>             Topic: foodee: 4               Leader: 0
>>> Replicas: 0,4       Isr: 0,4
>>>             Topic: foodee: 5               Leader: 0
>>> Replicas: 0,2       Isr: 0,2
>>> This has happened for many topics.
>>> As a result I am having lot of under replicated partitions and 
>>> partitions and leaders are not evenly distributed:
>>> Reassignment is also getting stuck.

Re: Replicas more than replication-factor

Reply via email to