Sorry folks, just realized I didn't use the correct thread format for the discussion. I started this new one and copied all of the responses from the old one.
@Dong It makes sense to just use the min.insync.replicas instead of introducing a new config, and we must make this change together with the LEO-based new leader election. @Xi I thought about embedding the LEO information to the ControllerContext, didn't find a way. Using RPC will make the leader election period longer and this should happen in very rare cases (broker failure, controlled shutdown, preferred leader election and partition reassignment). @Jeff The current leader election is to pick the first replica from AR which exists both in the live brokers and ISR sets. I agree with you about changing the current/default behavior will cause many confusions, and that's the reason the title is "Add Support ...". In this case, we wouldn't break any current promises and provide a separate option for our user. In terms of KIP-250, I feel it is more like the "Semisynchronous Replication" in the MySQL world, and yes it is something between acks=1 and acks=insync.replicas. Additionally, I feel KIP-250 and KIP-227 are two orthogonal improvements. KIP-227 is to improve the replication protocol (like the introduction of parallel replication in MySQL), and KIP-250 is an enhancement for the replication architecture (sync, semi-sync, and async). Dong Lin > Thanks for the KIP. I have one quick comment before you provide more detail > on how to select the leader with the largest LEO. > Do you think it would make sense to change the default behavior of acks=-1, > such that broker will acknowledge the message once the message has been > replicated to min.insync.replicas brokers? This would allow us to keep the > same durability guarantee, improve produce request latency without having a > new config. Hu Xi > Currently, with holding the assigned replicas(AR) for all partitions, > controller is now able to elect new leaders by selecting the first replica > of AR which occurs in both live replica set and ISR. If switching to the > LEO-based strategy, controller context might need to be enriched or > augmented to store those values. If retrieving those LEOs real-time, > several rounds of RPCs are unavoidable which seems to violate the original > intention of this KIP. Jeff Widman > I agree with Dong, we should see if it's possible to change the default > behavior so that as soon as min.insync.replicas brokers respond than the > broker acknowledges the message back to the client without waiting for > additional brokers who are in the in-sync replica list to respond. (I > actually thought it already worked this way). > As you implied in the KIP though, changing this default introduces a weird > state where an in-sync follower broker is not guaranteed to have a > message... > So at a minimum, the leadership failover algorithm would need to be sure to > pick the most up-to-date follower... I thought it already did this? > But if multiple brokers fail in quick succession, then a broker that was in > the ISR could become a leader without ever receiving the message... > violating the current promises of unclean.leader.election.enable=False... > so changing the default might be not be a tenable solution. > What also jumped out at me in the KIP was the goal of reducing p999 when > setting replica lag time at 10 seconds(!!)... I understand the desire to > minimize frequent ISR shrink/expansion, as I face this same issue at my day > job. But what you're essentially trying to do here is create an additional > replication state that is in-between acks=1 and acks = ISR to paper over a > root problem of ISR shrink/expansion... > I'm just wary of shipping more features (and more operational confusion) if > it's only addressing the symptom rather than the root cause. For example, > my day job's problem is we run a very high number of low-traffic > partitions-per-broker, so the fetch requests hit many partitions before > they fill. Solving that requires changing our architecture + making the > replication protocol more efficient (KIP-227). On Tue, Jan 23, 2018 at 10:02 PM, Litao Deng <litao.d...@airbnb.com> wrote: > Hey folks. I would like to add a feature to support the quorum-based > acknowledgment for the producer request. We have been running a modified > version of Kafka on our testing cluster for weeks, the improvement of P999 > is significant with very stable latency. Additionally, I have a proposal to > achieve a similar data durability as with the insync.replicas-based > acknowledgment through LEO-based leader election. > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 250+Add+Support+for+Quorum-based+Producer+Acknowledge >