Sorry folks, just realized I didn't use the correct thread format for the
discussion. I started this new one and copied all of the responses from the
old one.

@Dong
It makes sense to just use the min.insync.replicas instead of introducing a
new config, and we must make this change together with the LEO-based new
leader election.

@Xi
I thought about embedding the LEO information to the ControllerContext,
didn't find a way. Using RPC will make the leader election period longer
and this should happen in very rare cases (broker failure, controlled
shutdown, preferred leader election and partition reassignment).

@Jeff
The current leader election is to pick the first replica from AR which
exists both in the live brokers and ISR sets. I agree with you about
changing the current/default behavior will cause many confusions, and
that's the reason the title is "Add Support ...". In this case, we wouldn't
break any current promises and provide a separate option for our user.
In terms of KIP-250, I feel it is more like the "Semisynchronous
Replication" in the MySQL world, and yes it is something between acks=1 and
acks=insync.replicas. Additionally, I feel KIP-250 and KIP-227 are
two orthogonal improvements. KIP-227 is to improve the replication protocol
(like the introduction of parallel replication in MySQL), and KIP-250 is an
enhancement for the replication architecture (sync, semi-sync, and async).


Dong Lin

> Thanks for the KIP. I have one quick comment before you provide more detail
> on how to select the leader with the largest LEO.
> Do you think it would make sense to change the default behavior of acks=-1,
> such that broker will acknowledge the message once the message has been
> replicated to min.insync.replicas brokers? This would allow us to keep the
> same durability guarantee, improve produce request latency without having a
> new config.


Hu Xi

> Currently,  with holding the assigned replicas(AR) for all partitions,
> controller is now able to elect new leaders by selecting the first replica
> of AR which occurs in both live replica set and ISR. If switching to the
> LEO-based strategy, controller context might need to be enriched or
> augmented to store those values.  If retrieving those LEOs real-time,
> several rounds of RPCs are unavoidable which seems to violate the  original
> intention of this KIP.​


Jeff Widman

> I agree with Dong, we should see if it's possible to change the default
> behavior so that as soon as min.insync.replicas brokers respond than the
> broker acknowledges the message back to the client without waiting for
> additional brokers who are in the in-sync replica list to respond. (I
> actually thought it already worked this way).
> As you implied in the KIP though, changing this default introduces a weird
> state where an in-sync follower broker is not guaranteed to have a
> message...
> So at a minimum, the leadership failover algorithm would need to be sure to
> pick the most up-to-date follower... I thought it already did this?
> But if multiple brokers fail in quick succession, then a broker that was in
> the ISR could become a leader without ever receiving the message...
> violating the current promises of unclean.leader.election.enable=False...
> so changing the default might be not be a tenable solution.
> What also jumped out at me in the KIP was the goal of reducing p999 when
> setting replica lag time at 10 seconds(!!)... I understand the desire to
> minimize frequent ISR shrink/expansion, as I face this same issue at my day
> job. But what you're essentially trying to do here is create an additional
> replication state that is in-between acks=1 and acks = ISR to paper over a
> root problem of ISR shrink/expansion...
> I'm just wary of shipping more features (and more operational confusion) if
> it's only addressing the symptom rather than the root cause. For example,
> my day job's problem is we run a very high number of low-traffic
> partitions-per-broker, so the fetch requests hit many partitions before
> they fill. Solving that requires changing our architecture + making the
> replication protocol more efficient (KIP-227).


On Tue, Jan 23, 2018 at 10:02 PM, Litao Deng <litao.d...@airbnb.com> wrote:

> Hey folks. I would like to add a feature to support the quorum-based
> acknowledgment for the producer request. We have been running a modified
> version of Kafka on our testing cluster for weeks, the improvement of P999
> is significant with very stable latency. Additionally, I have a proposal to
> achieve a similar data durability as with the insync.replicas-based
> acknowledgment through LEO-based leader election.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 250+Add+Support+for+Quorum-based+Producer+Acknowledge
>

Reply via email to