In cases like this where situation isn’t self-repairing, I stop the at-fault broker and delete the topic-partition directory/directories from the filesystem before starting the broker again. Are you using local storage on your brokers?
-- Peter > On Feb 12, 2020, at 8:33 PM, Madhuri Khattar (mkhattar) > <mkhat...@cisco.com.invalid> wrote: > > In my case I figured it was broker 3 and rebooted it after deleting > /admin/reassign-partitions since it was stuck there. However I am not having > any luck. > I still see 400+ underreplicated partitions. I replaced broker 3 as well but > still there is no change after almost 7-8 hours now. > > > > Madhuri Khattar > SR ENGINEER.IT ENGINEERING > mkhat...@cisco.com > Tel: +1 408 525 5989 > > Cisco Systems, Inc. > 400 East Tasman Drive > SAN JOSE > 95134 > United States > cisco.com > > Think before you print. > This email may contain confidential and privileged material for the sole use > of the intended recipient. Any review, use, distribution or disclosure by > others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by reply > email and delete all copies of this message. > http://www.cisco.com/c/en/us/about/legal/terms-sale-software-license-agreement/company-registration-information.html > -----Original Message----- > From: Peter Bukowinski <pmb...@gmail.com> > Sent: Wednesday, February 12, 2020 8:27 PM > To: users@kafka.apache.org > Cc: senthilec...@apache.org > Subject: Re: Replicas more than replication-factor > > I’ve had this happen a few times when a partition reassignment was underway > and one of the brokers that is a destination for the reassignment became > unhealthy. This essentially stalls the reassignment indefinitely. The > partition with 10 instead of 5 replicas was undergoing a reassignment where > all the replicas were being moved to new brokers. When that occurs, there > will necessarily be 2x the number of desired replicas for a short time before > the old replicas are removed. > > The solution is to identify the faulty broker and make it healthy again. In > this case, it looks like broker 59 is at fault since it is the only one not > in the ISR set on the under-replicated partitions. > > -- Peter > >> On Feb 12, 2020, at 8:10 PM, SenthilKumar K <senthilec...@gmail.com> wrote: >> >> We are also facing the similar issue in our kafka cluster. >> >> Kafka Version: 2.2.0 >> RF: 5 >> >> PartitionLatest OffsetLeaderReplicasIn Sync ReplicasPreferred >> Leader?Under Replicated? >> 0 121 <http://198.18.134.22:9000/clusters/IAD/brokers/121> >> (121,50,51,52,53) >> (52,121,53,50,51) true false >> 1 122 <http://198.18.134.22:9000/clusters/IAD/brokers/122> >> (122,51,52,53,54) >> (52,53,54,51,122) true false >> 2 123 <http://198.18.134.22:9000/clusters/IAD/brokers/123> >> (123,52,53,54,55) >> (52,53,54,123,55) true false >> 3 125 <http://198.18.134.22:9000/clusters/IAD/brokers/125> >> (125,53,54,55,56) >> (56,125,53,54,55) true false >> 4 127 <http://198.18.134.22:9000/clusters/IAD/brokers/127> >> (127,54,55,56,57) >> (56,57,54,127,55) true false >> 5 56 <http://198.18.134.22:9000/clusters/IAD/brokers/56> >> (56,93,57,102,92,96,128,59,95,55) (56,93,57,102,92,96,128,95,55) true >> true >> 6 56 <http://198.18.134.22:9000/clusters/IAD/brokers/56> >> (56,93,57,60,97,96,129,59,103,95) (56,93,57,60,97,96,129,103,95) true >> true >> 7 57 <http://198.18.134.22:9000/clusters/IAD/brokers/57> >> (57,60,97,96,59,130,95,104,62,100) (57,60,97,96,130,95,104,62,100) >> true true >> 8 101 <http://198.18.134.22:9000/clusters/IAD/brokers/101> >> (101,60,65,97,96,105,59,131,62,100) (101,60,65,97,96,105,131,62) true >> true >> >> Have a look at the partition ^ 5, total replica for partition 5 is 10 >> but the replica set to 5. Under-replicated % is 26 for the same topic. >> And the partition reassignment is getting stuck for more than 24 hours. >> >> Under-replicated % 26 >> >> --Senthil >>> On Thu, Feb 13, 2020 at 6:32 AM Madhuri Khattar (mkhattar) >>> <mkhat...@cisco.com.invalid> wrote: >>> Hi Today there was some network glich which caused some of my topics >>> to have replicas more than the replication-factor: >>> $ ./kafka-topics --zookeeper sjc-ddzk-01 --topic foodee --describe >>> Topic:foodee PartitionCount:6 ReplicationFactor:2 >>> Configs:retention.ms=345600000,segment.ms >>> =345600000,max.message.bytes=2000000 >>> Topic: foodee: 0 Leader: 0 >>> Replicas: 0,1 Isr: 0,1 >>> Topic: foodee: 1 Leader: 1 >>> Replicas: 1,2 Isr: 1,2 >>> Topic: foodee: 2 Leader: 1 >>> Replicas: 2,3,1 Isr: 1,2 >>> Topic: foodee: 3 Leader: 1 >>> Replicas: 3,4,1 Isr: 1,4 >>> Topic: foodee: 4 Leader: 0 >>> Replicas: 0,4 Isr: 0,4 >>> Topic: foodee: 5 Leader: 0 >>> Replicas: 0,2 Isr: 0,2 >>> This has happened for many topics. >>> As a result I am having lot of under replicated partitions and >>> partitions and leaders are not evenly distributed: >>> Reassignment is also getting stuck.