Re: Auto preferred leader elections cause data loss?

Zhao Weinan Mon, 14 Sep 2015 21:20:01 -0700

Hi Gwen,

Thanks for reply. Just realized *replica.lag.time.max.ms
<http://replica.lag.time.max.ms> and replica.lag.max.messages *just work
when leader checking maybe shrink ISR. So for this scenario it's safe.


Then I think what I've been through is caused by follower's HW always <=
leader's, just like Guozhang said here
<http://mail-archives.apache.org/mod_mbox/kafka-users/201407.mbox/%3CCAHwHRrVYMTbWjNsQ%3D2sWfenM5mM7dU_iCCEcDq20AoriDKSTQA%40mail.gmail.com%3E>,
that is something like this (From the codes I think LEO is next message
offset that is advanced than committed message end offset by 1, correct me
if I'm wrong please):

   1.  LEO at {0,1,2} = {101,101,101}, HW at {0,1,2} = {101,101,101}
   2.  Leader 0 receive a message with acks=all, write to local, update LEO
   to 102, put request into producer request purgatory
   3.  LEO at {0,1,2} = {102,101,101}, HW at {0,1,2} = {101,101,101}
   4.  Follower 1 and 2 send fetch request with offset 101, leader 0 send
   fetch response back, follower 1 and 2 receive fetch response, update LEO
   5.  LEO at {0,1,2} = {102,102,102}, HW at {0,1,2} = {101,101,101}
   6.  Follower 1 and 2 send fetch request with offset 102, leader 0 update
   hw, put request into fetch request purgatory, send ack to producer at
   step2
   7.  LEO at {0,1,2} = {102,102,102}, HW at {0,1,2} = {102,101,101}
   8. After some time, fetch request's max wait exceed, leader 0 send empty
   fetch response with new HW 102 back, follower 1 and 2 update HW
   9. LEO at {0,1,2} = {102,102,102}, HW at {0,1,2} = {102,102,102}

If broker 0 down between step 6 and 8, broker 1 is elected as new leader,
then message 101 is lost because it's HW is 101...Even if broker 0 came
back, it will become follower and truncate it's log to new leader's HW 101,
so that message is gone for ever, right?
So what we could do is still do whatever we can to avoid leader transition,
right?

Thanks.

2015-09-14 23:56 GMT+08:00 Gwen Shapira <g...@confluent.io>:

> acks = all should prevent this scenario:
>
> If broker 0 is still in ISR, the produce request for 101 will not be
> "acked" (because 0 is in ISR and not available for acking), and the
> producer will retry it until all ISR acks.
>
> If broker 0 dropped off ISR, it will not be able to rejoin until it has all
> the latest messages, including 101.
>
> So if you use the safe settings you should be safe in this scenario.
>
> Gwen
>
>
> On Sat, Sep 12, 2015 at 3:15 AM, Zhao Weinan <zhaow...@gmail.com> wrote:
>
> > Hi group,
> >
> > I've been through some data loss on kafka cluster, one case maybe caused
> by
> > the auto preferred leader election.
> >
> > Here is the situation: 3 brokers = {0,1,2}, 1 partition with 3 replicas
> on
> > 0/1/2, all in sync while 0 is leader and 1 is controller, current offset
> is
> > 100.
> >
> > And here is my hypothesis:
> > a. Leader 0 is temporary gone due to instablity connection with ZK
> > b. Controller 1 found that 0 has gone then do a election which let 1(in
> > ISR) to be leader
> > c. A producer send 1 message to new leader 1, then offset is 101
> > d. Old leader 0 is back to cluster(*STILL IN ISR* because the lag not
> > exceed the *replica.lag.time.max.ms <http://replica.lag.time.max.ms>*
> and
> > *replica.lag.max.messages*)
> > e. Coincidentally, controller 1 start to
> checkAndTriggerPartitionRebalance
> > then decide 0 is more preferred, so let 0 to be back to leader
> > f. Broker 1 become to follower then found HW to be 100, so truncate to
> 100
> > that lead to lost newest message.
> >
> > With this situation, using most reliable settings(broker side:
> > unclean.leader.election.enable=false, min.insync.replicas=2; producer
> side
> > acks=all) is useless. Am I correct or just paranoia?
> >
> > Below is some real logs from production.
> > I*n controller.log:*
> >
> > > *// broker 6 tempraroy gone*
> > >
> > [2015-09-09 15:24:42,206] INFO [BrokerChangeListener on Controller 3]:
> > > Newly added brokers: , deleted brokers: 6, all live brokers:
> > 0,5,1,2,7,3,4
> > > (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> > > [2015-09-09 15:24:42,461] INFO [Controller 3]: Broker failure callback
> > for
> > > 6 (kafka.controller.KafkaController)
> > > [2015-09-09 15:24:42,464] INFO [Controller 3]: Removed ArrayBuffer()
> from
> > > list of shutting down brokers. (kafka.controller.KafkaController)
> > > [2015-09-09 15:24:42,466] INFO [Partition state machine on Controller
> 3]:
> > > Invoking state change to OfflinePartition for partitions
> > > [SOME_TOPIC_NAME,1] (kafka.controller.PartitionStateMachine)
> > >
> > > *// elect 3 which in ISR to be leader*
> > > [2015-09-09 15:24:43,182] DEBUG [OfflinePartitionLeaderSelector]: Some
> > > broker in ISR is alive for [SOME_TOPIC_NAME,1]. Select 3 from ISR 3,4
> to
> > be
> > > the leader. (kafka.controller.OfflinePartitionLeaderSelector)
> > > [2015-09-09 15:24:43,182] INFO [OfflinePartitionLeaderSelector]:
> Selected
> > > new leader and ISR {"leader":3,"leader_epoch":45,"isr":[3,4]} for
> offline
> > > partition [SOME_TOPIC_NAME,1]
> > > (kafka.controller.OfflinePartitionLeaderSelector)
> > > [2015-09-09 15:24:43,928] DEBUG [Controller 3]: Removing replica 6 from
> > > ISR 3,4 for partition [SOME_TOPIC_NAME,1].
> > > (kafka.controller.KafkaController)
> > > [2015-09-09 15:24:43,929] WARN [Controller 3]: Cannot remove replica 6
> > > from ISR of partition [SOME_TOPIC_NAME,1] since it is not in the ISR.
> > > Leader = 3 ; ISR = List(3, 4) (kafka.controller.KafkaController)
> > >
> > > *// broker 6 back*
> > >
> > [2015-09-09 15:24:44,575] INFO [BrokerChangeListener on Controller 3]:
> > > Newly added brokers: 6, deleted brokers: , all live brokers:
> > > 0,5,1,6,2,7,3,4
> > (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> > >
> > > *// broker 6 is elected as leader by auto preferred leader election*
> > >
> > [2015-09-09 15:24:50,939] INFO [Controller 3]: Starting preferred replica
> > > leader election for partitions [SOME_TOPIC_NAME,1]
> > > (kafka.controller.KafkaController)
> > > [2015-09-09 15:24:50,945] INFO
> [PreferredReplicaPartitionLeaderSelector]:
> > > Current leader 3 for partition [SOME_TOPIC_NAME,1] is not the preferred
> > > replica. Trigerring preferred replica leader election
> > > (kafka.controller.PreferredReplicaPartitionLeaderSelector)
> > >
> > >
> > *And in server.log:*
> >
> > > *// broker 3 truncating, lossing data*
> >
> > 2015-09-09 15:24:50,964] INFO Truncating log SOME_TOPIC_NAME-1 to offset
> > > 420549. (kafka.log.Log)
> > >
> >
>

Re: Auto preferred leader elections cause data loss?

Reply via email to