I agree with Dong, we should see if it's possible to change the default
behavior so that as soon as min.insync.replicas brokers respond than the
broker acknowledges the message back to the client without waiting for
additional brokers who are in the in-sync replica list to respond. (I
actually thought it already worked this way).

As you implied in the KIP though, changing this default introduces a weird
state where an in-sync follower broker is not guaranteed to have a
message...

So at a minimum, the leadership failover algorithm would need to be sure to
pick the most up-to-date follower... I thought it already did this?

But if multiple brokers fail in quick succession, then a broker that was in
the ISR could become a leader without ever receiving the message...
violating the current promises of unclean.leader.election.enable=False...
so changing the default might be not be a tenable solution.

What also jumped out at me in the KIP was the goal of reducing p999 when
setting replica lag time at 10 seconds(!!)... I understand the desire to
minimize frequent ISR shrink/expansion, as I face this same issue at my day
job. But what you're essentially trying to do here is create an additional
replication state that is in-between acks=1 and acks = ISR to paper over a
root problem of ISR shrink/expansion...

I'm just wary of shipping more features (and more operational confusion) if
it's only addressing the symptom rather than the root cause. For example,
my day job's problem is we run a very high number of low-traffic
partitions-per-broker, so the fetch requests hit many partitions before
they fill. Solving that requires changing our architecture + making the
replication protocol more efficient (KIP-227).





On Tue, Jan 23, 2018 at 10:31 PM, Dong Lin <lindon...@gmail.com> wrote:

> Hey Litao,
>
> Thanks for the KIP. I have one quick comment before you provide more detail
> on how to select the leader with the largest LEO.
>
> Do you think it would make sense to change the default behavior of acks=-1,
> such that broker will acknowledge the message once the message has been
> replicated to min.insync.replicas brokers? This would allow us to keep the
> same durability guarantee, improve produce request latency without having a
> new config.
>
> Thanks,
> Dong
>
> On Tue, Jan 23, 2018 at 8:38 PM, Litao Deng <denglitaoch...@gmail.com>
> wrote:
>
> > Hey folks. I would like to add a feature to support the quorum-based
> > acknowledgment for the producer request. We have been running a
> > modified version of Kafka on our testing cluster for weeks, the
> > improvement of P999 is significant with very stable latency.
> > Additionally, I have a proposal to achieve a similar data durability
> > as with the insync.replicas-based acknowledgment through LEO-based
> > leader election.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 250+Add+Support+for+Quorum-based+Producer+Acknowledge
> >
>



-- 

*Jeff Widman*
jeffwidman.com <http://www.jeffwidman.com/> | 740-WIDMAN-J (943-6265)
<><

Reply via email to