Re: Network partition leaves topic-partition leader as sole ISR despite min.isr=2 and producer acks=all settings

Sabit Nepal Wed, 14 Aug 2024 12:49:38 -0700

Hi Kamal,

Thank you for the information. It's great to hear that this is being
actively developed. I'm curious if you (or anyone else on the list) is
aware of improvements that could reduce the chance of hitting this failure
mode? One improvement I found KIP-497
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-497%3A+Add+inter-broker+API+to+alter+ISR>
which sounds like it could improve this failure mode because the Leaders
can only shrink ISRs through the controller, instead of via ZK directly, so
they cannot be partitioned from other brokers. Are there any other
improvements in this realm (besides KRaft) that would be worthwhile for us
to test with before KIP-966 has fully rolled out?


Thanks,
Sabit

On Mon, Aug 12, 2024 at 2:25 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Sabit,
>
> Thanks for reporting the issue! This is the last standing replica problem
> and is being fixed in KIP-966.
> You can go through the below blog to understand it in detail:
>
>
> https://jack-vanlightly.com/blog/2023/8/17/kafka-kip-966-fixing-the-last-replica-standing-issue#:~:text=Rule%20number%20one%20of%20leader,behind%20they%20also%20get%20removed
> .
>
> On Mon, Aug 12, 2024 at 2:03 AM Sabit Nepal <gta0...@gmail.com> wrote:
>
> > Hello,
> >
> > We experienced a network partition in our kafka cluster which left 1
> broker
> > unable to be reached by other brokers, but it could still reach our
> > zookeeper cluster. When this occurred, a number of topic-partitions
> shrunk
> > their ISR to just the impaired broker itself, halting progress on those
> > partitions. As we had to take the broker instance offline and provision a
> > replacement, the partitions were unavailable until the replacement
> instance
> > came back up and resumed acting as the broker.
> >
> > However, reviewing our broker and producer settings, I'm not sure why
> it's
> > possible for the leader to have accepted some writes that were not able
> to
> > be replicated to the followers. Our topics use min.insync.replicas=2 and
> > our producers use acks=all configuration. In this scenario, with the
> > changes not being replicated to other followers, I'd expect the records
> to
> > have failed to be written. We are however on an older version of kafka -
> > 2.6.1 - so I'm curious if maybe future versions have improved the
> behavior
> > here?
> >
> > Some relevant logs:
> >
> > [Partition my-topic-one-120 broker=7] Shrinking ISR from 7,8,9 to 7.
> > Leader: (highWatermark: 82153383556, endOffset: 82153383565). Out of sync
> > replicas: (brokerId: 8, endOffset: 82153383556) (brokerId: 9, endOffset:
> > 82153383561).
> > [Partition my-topic-one-120 broker=7] ISR updated to [7] and zkVersion
> > updated to [1367]
> > [ReplicaFetcher replicaId=9, leaderId=7, fetcherId=5] Error in response
> for
> > fetch request (type=FetchRequest, replicaId=9, maxWait=500, minBytes=1,
> > maxBytes=10485760, fetchData={my-topic-one-120=(fetchOffset=75987953095,
> > logStartOffset=75983970457, maxBytes=1048576,
> > currentLeaderEpoch=Optional[772]),
> > my-topic-one-84=(fetchOffset=87734453342,
> > logStartOffset=87730882175, maxBytes=1048576,
> > currentLeaderEpoch=Optional[776]),
> > my-topic-one-108=(fetchOffset=72037212609,
> > logStartOffset=72034727231, maxBytes=1048576,
> > currentLeaderEpoch=Optional[776]),
> > my-topic-one-72=(fetchOffset=83006080094,
> > logStartOffset=83002240584, maxBytes=1048576,
> > currentLeaderEpoch=Optional[768]),
> > my-topic-one-96=(fetchOffset=79250375295,
> > logStartOffset=79246320254, maxBytes=1048576,
> > currentLeaderEpoch=Optional[763])}, isolationLevel=READ_UNCOMMITTED,
> > toForget=, metadata=(sessionId=965270777, epoch=725379656), rackId=)
> > [Controller id=13 epoch=611] Controller 13 epoch 611 failed to change
> state
> > for partition my-topic-one-120 from OnlinePartition to OnlinePartition
> > kafka.common.StateChangeFailedException: Failed to elect leader for
> > partition my-topic-one-120 under strategy
> > ControlledShutdownPartitionLeaderElectionStrategy
> > (later)
> > kafka.common.StateChangeFailedException: Failed to elect leader for
> > partition my-topic-one-120 under strategy
> > OfflinePartitionLeaderElectionStrategy(false)
> >
> > Configuration for this topic:
> >
> > Topic: my-topic-one PartitionCount: 250 ReplicationFactor: 3 Configs:
> > min.insync.replicas=2,segment.bytes=536870912,retention.ms
> > =1800000,unclean.leader.election.enable=false
> >
> > Outside of this topic, we also had a topic with a replication factor of 5
> > impacted, and also the __consumer_offsets topic which we set to an RF of
> 5.
> >
> > [Partition my-topic-two-204 broker=7] Shrinking ISR from 10,9,7,11,8 to
> 7.
> > Leader: (highWatermark: 86218167, endOffset: 86218170). Out of sync
> > replicas: (brokerId: 10, endOffset: 86218167) (brokerId: 9, endOffset:
> > 86218167) (brokerId: 11, endOffset: 86218167) (brokerId: 8, endOffset:
> > 86218167).
> > Configuration:
> > Topic: my-topic-two PartitionCount: 500 ReplicationFactor: 5 Configs:
> > min.insync.replicas=2,segment.jitter.ms
> > =3600000,cleanup.policy=compact,segment.bytes=1048576,
> > max.compaction.lag.ms
> > =9000000,min.compaction.lag.ms
> > =4500000,unclean.leader.election.enable=false,
> > delete.retention.ms=86400000,segment.ms=21600000
> >
> > [Partition __consumer_offsets-18 broker=7] Shrinking ISR from 10,9,7,11,8
> > to 7. Leader: (highWatermark: 4387657484, endOffset: 4387657485). Out of
> > sync replicas: (brokerId: 9, endOffset: 4387657484) (brokerId: 8,
> > endOffset: 4387657484) (brokerId: 10, endOffset: 4387657484) (brokerId:
> 11,
> > endOffset: 4387657484).
> > Configuration:
> > Topic: __consumer_offsets PartitionCount: 50 ReplicationFactor: 5
> Configs:
> >
> >
> compression.type=producer,min.insync.replicas=2,cleanup.policy=compact,segment.bytes=104857600,unclean.leader.election.enable=false
> >
> > Other configurations:
> > zookeeper.connection.timeout.ms=6000
> > replica.lag.time.max.ms=8000
> > zookeeper.session.timeout.ms=6000
> > Producer request.timeout.ms=8500
> > Producer linger.ms=10
> > Producer delivery.timeout.ms=38510
> >
> > I saw a similar issue described in KAFKA-8702
> > <https://issues.apache.org/jira/browse/KAFKA-8702> however I did not
> see a
> > resolution there. Any help with this would be appreciated, thank you!
> >
>

Re: Network partition leaves topic-partition leader as sole ISR despite min.isr=2 and producer acks=all settings

Reply via email to