RE: Replicas more than replication-factor

Madhuri Khattar (mkhattar) Wed, 12 Feb 2020 21:01:57 -0800

Yes filesystem is local. However when I added new broker, it was full clean 
disk.



Madhuri Khattar
SR ENGINEER.IT ENGINEERING
mkhat...@cisco.com
Tel: +1 408 525 5989

Cisco Systems, Inc.
400 East Tasman Drive
SAN JOSE
95134
United States
cisco.com

Think before you print.
This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
http://www.cisco.com/c/en/us/about/legal/terms-sale-software-license-agreement/company-registration-information.html
-----Original Message-----
From: Peter Bukowinski <pmb...@gmail.com> 
Sent: Wednesday, February 12, 2020 9:00 PM
To: users@kafka.apache.org
Subject: Re: Replicas more than replication-factor

In cases like this where situation isn’t self-repairing, I stop the at-fault 
broker and delete the topic-partition directory/directories from the filesystem 
before starting the broker again. Are you using local storage on your brokers?

--
Peter

> On Feb 12, 2020, at 8:33 PM, Madhuri Khattar (mkhattar) 
> <mkhat...@cisco.com.invalid> wrote:
> 
> In my case I figured it was broker 3 and rebooted it after deleting 
> /admin/reassign-partitions since it was stuck there. However I am not having 
> any luck.
> I still see 400+ underreplicated partitions. I replaced broker 3 as well but 
> still there is no change after almost 7-8 hours now.
> 
> 
> 
> Madhuri Khattar
> SR ENGINEER.IT ENGINEERING
> mkhat...@cisco.com
> Tel: +1 408 525 5989
> 
> Cisco Systems, Inc.
> 400 East Tasman Drive
> SAN JOSE
> 95134
> United States
> cisco.com
> 
> Think before you print.
> This email may contain confidential and privileged material for the sole use 
> of the intended recipient. Any review, use, distribution or disclosure by 
> others is strictly prohibited. If you are not the intended recipient (or 
> authorized to receive for the recipient), please contact the sender by reply 
> email and delete all copies of this message.
> http://www.cisco.com/c/en/us/about/legal/terms-sale-software-license-a
> greement/company-registration-information.html
> -----Original Message-----
> From: Peter Bukowinski <pmb...@gmail.com>
> Sent: Wednesday, February 12, 2020 8:27 PM
> To: users@kafka.apache.org
> Cc: senthilec...@apache.org
> Subject: Re: Replicas more than replication-factor
> 
> I’ve had this happen a few times when a partition reassignment was underway 
> and one of the brokers that is a destination for the reassignment became 
> unhealthy. This essentially stalls the reassignment indefinitely. The 
> partition with 10 instead of 5 replicas was undergoing a reassignment where 
> all the replicas were being moved to new brokers. When that occurs, there 
> will necessarily be 2x the number of desired replicas for a short time before 
> the old replicas are removed.
> 
> The solution is to identify the faulty broker and make it healthy again. In 
> this case, it looks like broker 59 is at fault since it is the only one not 
> in the ISR set on the under-replicated partitions.
> 
> -- Peter
> 
>> On Feb 12, 2020, at 8:10 PM, SenthilKumar K <senthilec...@gmail.com> wrote:
>> 
>> We are also facing the similar issue in our kafka cluster.
>> 
>> Kafka Version: 2.2.0
>> RF: 5
>> 
>> PartitionLatest OffsetLeaderReplicasIn Sync ReplicasPreferred 
>> Leader?Under Replicated?
>> 0 121 <http://198.18.134.22:9000/clusters/IAD/brokers/121>
>> (121,50,51,52,53)
>> (52,121,53,50,51) true false
>> 1 122 <http://198.18.134.22:9000/clusters/IAD/brokers/122>
>> (122,51,52,53,54)
>> (52,53,54,51,122) true false
>> 2 123 <http://198.18.134.22:9000/clusters/IAD/brokers/123>
>> (123,52,53,54,55)
>> (52,53,54,123,55) true false
>> 3 125 <http://198.18.134.22:9000/clusters/IAD/brokers/125>
>> (125,53,54,55,56)
>> (56,125,53,54,55) true false
>> 4 127 <http://198.18.134.22:9000/clusters/IAD/brokers/127>
>> (127,54,55,56,57)
>> (56,57,54,127,55) true false
>> 5 56 <http://198.18.134.22:9000/clusters/IAD/brokers/56>
>> (56,93,57,102,92,96,128,59,95,55) (56,93,57,102,92,96,128,95,55) true 
>> true
>> 6 56 <http://198.18.134.22:9000/clusters/IAD/brokers/56>
>> (56,93,57,60,97,96,129,59,103,95) (56,93,57,60,97,96,129,103,95) true 
>> true
>> 7 57 <http://198.18.134.22:9000/clusters/IAD/brokers/57>
>> (57,60,97,96,59,130,95,104,62,100) (57,60,97,96,130,95,104,62,100) 
>> true true
>> 8 101 <http://198.18.134.22:9000/clusters/IAD/brokers/101>
>> (101,60,65,97,96,105,59,131,62,100) (101,60,65,97,96,105,131,62) true 
>> true
>> 
>> Have a look at the partition ^ 5, total replica for partition 5 is 10 
>> but the replica set to 5. Under-replicated % is 26 for the same topic.
>> And the partition reassignment is getting stuck for more than 24 hours.
>> 
>> Under-replicated % 26
>> 
>> --Senthil
>>> On Thu, Feb 13, 2020 at 6:32 AM Madhuri Khattar (mkhattar) 
>>> <mkhat...@cisco.com.invalid> wrote:
>>> Hi Today there was some network glich which caused some of my topics 
>>> to have replicas more than the replication-factor:
>>> $ ./kafka-topics --zookeeper sjc-ddzk-01 --topic foodee --describe
>>> Topic:foodee     PartitionCount:6               ReplicationFactor:2
>>> Configs:retention.ms=345600000,segment.ms
>>> =345600000,max.message.bytes=2000000
>>>             Topic: foodee: 0               Leader: 0
>>> Replicas: 0,1       Isr: 0,1
>>>             Topic: foodee: 1               Leader: 1
>>> Replicas: 1,2       Isr: 1,2
>>>             Topic: foodee: 2               Leader: 1
>>> Replicas: 2,3,1   Isr: 1,2
>>>             Topic: foodee: 3               Leader: 1
>>> Replicas: 3,4,1   Isr: 1,4
>>>             Topic: foodee: 4               Leader: 0
>>> Replicas: 0,4       Isr: 0,4
>>>             Topic: foodee: 5               Leader: 0
>>> Replicas: 0,2       Isr: 0,2
>>> This has happened for many topics.
>>> As a result I am having lot of under replicated partitions and 
>>> partitions and leaders are not evenly distributed:
>>> Reassignment is also getting stuck.

RE: Replicas more than replication-factor

Reply via email to