Makes sense. Thanks for the explanation. -------- Original message --------From: Anna Povzner <a...@confluent.io> Date: 4/6/18 5:38 PM (GMT-08:00) To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-279: Fix log divergence between leader and follower after fast leader fail over Hi Ted,
I updated the Rejected Alternatives section with a more thorough description of alternatives and reasoning for choosing the solution we proposed. While it is more clear why the second alternative guarantees one roundtrip for the clean leader election case, the proposed solution also guarantees it. This is based on the fact that we cannot have more than one back-to-back leader change due to preferred leader election where the leader is not pushed out of the ISR, which means the follower will have at most one leader epoch unknown to the new leader, and so the leader will be able to respond with the epoch that the follower knows about in the first response. For unclean leader election case, the second alternative reduces the number of roundtrips but for rare cases: we need at least 3 fast leader changes to see the advantage. Approximate calculation: Proposed solution requires (N+1)/2 roundtrips for N fast leader changes (worst-case, could be less roundtrips for the same number of leader change); Alternative solution requires at most 2 roundtrips (except super rare cases, where we may want to limit the size of OffsetForLeaderEpoch request). This comes at the cost of a bigger change in the OffsetForLeaderEpoch request, larger OffsetForLeaderEpoch request size on average, and additional complexity of dealing with how long the sequence should be for the subsequent OffsetForLeaderEpoch requests, handling the edge/contrived cases where sequence may become too long. So, I think, the main trade-off here is improving efficiency of a broker becoming a follower in rare cases of unclean leader election/at least 3 fast leader changes vs. less complexity in the common case. The proposed solution in the KIP is for less complexity. Please let me know if you have any concerns or suggestions. Thanks, Anna On Thu, Apr 5, 2018 at 1:33 PM, Ted Yu <yuzhih...@gmail.com> wrote: > For the second alternative which was rejected (The follower sends all > sequences of {leader_epoch, end_offset}) > > bq. also increases the size of OffsetForLeaderEpoch request by at least > 64bit > > Though the size increases, the number of roundtrips is reduced meaningfully > which would increase the robustness of the solution. > > Please expand the reasoning for unclean leader election for this > alternative. > > Thanks > > On Thu, Apr 5, 2018 at 12:17 PM, Anna Povzner <a...@confluent.io> wrote: > > > Hi, > > > > > > I just created KIP-279 to fix edge cases of log divergence for both clean > > and unclean leader election configs. > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > 279%3A+Fix+log+divergence+between+leader+and+follower+ > > after+fast+leader+fail+over > > > > > > The KIP is basically a follow up to KIP-101, and proposes a slight > > extension to the replication protocol to fix edge cases where logs can > > diverge due to fast leader fail over. > > > > > > Feedback and suggestions are welcome! > > > > > > Thanks, > > > > Anna > > >