Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

Calvin Liu Thu, 09 Feb 2023 10:27:36 -0800

Hi Alexandre,
104: I am not aware of any proposals around the up-to-date replica
election. It can be a good place to dig deeper.
105: Yes, it almost puts nothing on the performance.


Thanks,
Calvin

On Thu, Feb 9, 2023 at 1:26 AM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi, Calvin,
>
> Many thanks for the replies.
>
> 103. Thanks for confirming the testing strategy. Exercising the right
> sequence of requests is a good idea.
>
> 104. Out of scope here, but the KIP reminds that currently, the
> controller does not attempt to detect data loss on the latest replica
> in the ISR and to choose the most up-to-date replica as a leader. Is
> this an area where the community wants to invest in building a
> potential solution?
>
> 105. I assume any performance impact of the change in Fetch requests
> is marginal if even noticeable at all.
>
>  Thanks,
> Alexandre
>
> Le mer. 8 févr. 2023 à 23:57, Calvin Liu <ca...@confluent.io.invalid> a
> écrit :
> >
> > For Jun:
> > --- Since this KIP changes the inter-broker protocol, should we bump up
> the
> > metadata version (the equivalent of IBP) during the upgrade?
> > Yes, we can.
> >
> > For Artem:
> > --- Could you elaborate on the behavior during rolls in the Compatibility
> > section?
> > We can update the metadata version(IBP) to make sure the brokers can have
> > the latest code.
> > Also, during the IBP upgrade, the leader can just skip setting broker
> epoch
> > or set -1 for the brokers that have not started using the new APIs. The
> > controller can bypass the check for such brokers so that the cluster can
> > still change ISR during the upgrade.
> >
> > --- Also for compatibility it's probably going to be easier to just add a
> > new
> > array of epochs in addition to the existing array of broker ids, instead
> of
> > removing one field and adding another.
> > We can do it both ways. Just prefer to replace it with a new field for
> > better readability.
> >
> > --- The KIP mentions that we would explicitly do something special in ZK
> > mode
> > in order to not implement new functionality.
> > It can be as simple as skipping setting the broker epoch if it is in ZK
> > mode.
> >
> > For Alexandre:
> > 100: Yes, you are right, it does not require a controller movement. I
> > removed the controller movement part when quoting the scenario from the
> > KAFKA-14139.
> > 101: In the current code, you can't remove the last replica in the ISR.
> > But, any damage to this replica will result in data loss.
> > 102: Yeah, I understand your concern. It is definitely great to have a
> fix
> > in ZK, but considering the KAFKA-14139 is a rare case, it has higher ROE
> > for applying to just Kraft mode.
> > 103: I found that it is much easier to repro the bug in Kraft mode by
> > feeding the controller with a given sequence of events/requests. So we
> may
> > just need a unit test to cover the case.
> >
> > For David:
> > 01:
> > Can you help explain why the follower can have a stale broker epoch? Is
> it
> > because the registration request has any delay? But the broker will not
> > start fetching before the registration success.
> > On the other hand, if the follower fetches with the stale broker epoch,
> is
> > it good enough to ask the leader to hold to include this follower until
> the
> > fetch is consistent with the metadata cache?
> > 02: Ack
> > 03: I think NewIsrWithEpochs is not ignorable. For the deprecated
> > field NewIsr, what is the right way to make it ignorable? Just mark it
> > ignorable in this change?
> > 04: Ack
> > 05: Ack
> > 06: Ack
> > 07: Yes, will add it.
> >
> >
> >
> > On Wed, Feb 8, 2023 at 6:52 AM David Jacot <dja...@confluent.io.invalid>
> > wrote:
> >
> > > Hi Calvin,
> > >
> > > Thanks for the KIP! The overall approach looks reasonable to me. I
> have a
> > > few questions/comments:
> > >
> > > 01. I wonder if the leader should also verify the broker epochs based
> on
> > > its metadata cache before sending the AlterPartition request to the
> > > controller. Imagine the case where a follower not in the ISR would keep
> > > fetching with a stale broker epoch but with a valid leader epoch. In
> this
> > > case, the leader would try to add it back to the ISR when the follower
> > > catches up but the controller would reject it because its broker epoch
> is
> > > stale. Until this condition resolves itself, the leader can't add any
> other
> > > followers back to the ISR because the ISR validation is all or nothing
> at
> > > the moment. Letting the leader check before sending the AlterPartition
> > > request would mitigate this. This condition must be rare but could
> happen.
> > > We did this in KIP-841.
> > >
> > > 02. In AlterPartitionRequest/AlterPartitionResponse, you need to
> update the
> > > `validVersions` to `0-3`.
> > >
> > > 03. Personally, I like the `NewIsrWithEpochs` field in
> > > the AlterPartitionRequest. We can handle the backward compatibility in
> the
> > > request builder. Basically, we would always populate NewIsrWithEpochs
> and
> > > the request builder can translate it to NewIsr if an earlier version is
> > > used. Should the field be ignorable?
> > >
> > > 04. Should NewIsrWithEpochs.BrokerEpoch have -1 as default?
> > >
> > > 05. In FetchRequest/FetchResponse, you need to update the
> `validVersions`
> > > as well.
> > >
> > > 06. It is a little weird to have ReplicaId and BrokerEpoch in the
> > > FetchRequest. I wonder if we should rename ReplicaId to BrokerId
> because it
> > > is actually the broker id (even the documentation of the field says
> it).
> > >
> > > 07. On the followers, the fetch request version is derived from the
> > > metadata version/IBP. As we add a new fetch version, we need to add a
> new
> > > metadata version/IBP as well.
> > >
> > > 08. Regarding KRaft vs ZK, I don't have a strong opinion. ZK is on the
> way
> > > out so not doing it seems fine. If we do this, we could basically just
> > > ignore the broker epoch in ZK and it will keep working as today.
> > >
> > > Best,
> > > David
> > >
> > > On Wed, Feb 8, 2023 at 3:01 PM Alexandre Dupriez <
> > > alexandre.dupr...@gmail.com> wrote:
> > >
> > > > Hi, Calvin,
> > > >
> > > > Thanks for the KIP and fast follow-up. A few questions.
> > > >
> > > > 100. The scenario illustrated in the KIP involves a controller
> > > > movement. Is this really required? Cannot the scenario occur with a
> > > > similar stale AlterPartition request and the same controller
> > > > throughout?
> > > >
> > > > 101. In the case where card(ISR) = 1 and the last replica leaves, it
> > > > will be re-elected as the leader upon reconnection. If the replica is
> > > > empty, all data for the partition will be lost. Is this a correct
> > > > understanding of the scenario?
> > > >
> > > > 102. I understand that ZK is going to be unsupported soon. However
> for
> > > > as long as it is available to end users, is there any reason not to
> > > > support the fix in ZK mode? Arguably, the implementation for the
> logic
> > > > to AlterPartition is duplicated for both controller types, and it may
> > > > be more work than is worth if ZK is fully decommissioned in the next
> > > > few months. (Alternatively, is there a plan to back port the fix to
> > > > older minor versions?).
> > > >
> > > > 103. The KIP mentions system tests to be used to simulate the race
> > > > condition. Would it be possible to provide more details about it? Do
> > > > we think it worth having this scenario be exercised in the functional
> > > > tests of the core/server module?
> > > >
> > > > Thanks,
> > > > Alexandre
> > > >
> > > > Le mer. 8 févr. 2023 à 03:31, Artem Livshits
> > > > <alivsh...@confluent.io.invalid> a écrit :
> > > > >
> > > > > Hi Calvin,
> > > > >
> > > > > Thank you for the KIP.  I have a similar question -- we need to
> support
> > > > > rolling upgrades (when we have some old brokers and some new
> brokers),
> > > so
> > > > > there could be combinations of old leader - new follower, new
> leader -
> > > > old
> > > > > follower, new leader - old controller, old leader - new controller.
> > > > Could
> > > > > you elaborate on the behavior during rolls in the Compatibility
> > > section?
> > > > >
> > > > > Also for compatibility it's probably going to be easier to just
> add a
> > > new
> > > > > array of epochs in addition to the existing array of broker ids,
> > > instead
> > > > of
> > > > > removing one field and adding another.
> > > > >
> > > > > The KIP mentions that we would explicitly do something special in
> ZK
> > > mode
> > > > > in order to not implement new functionality.  I think it may be
> easier
> > > to
> > > > > implement functionality for both ZK and KRraft mode than adding
> code to
> > > > > disable it in ZK mode.
> > > > >
> > > > > -Artem
> > > > >
> > > > > On Tue, Feb 7, 2023 at 4:58 PM Jun Rao <j...@confluent.io.invalid>
> > > wrote:
> > > > >
> > > > > > Hi, Calvin,
> > > > > >
> > > > > > Thanks for the KIP. Looks good to me overall.
> > > > > >
> > > > > > Since this KIP changes the inter-broker protocol, should we bump
> up
> > > the
> > > > > > metadata version (the equivalent of IBP) during upgrade?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu
> > > <ca...@confluent.io.invalid
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > > I'd like to discuss the fix for the broker reboot data loss
> > > > KAFKA-14139
> > > > > > > <https://issues.apache.org/jira/browse/KAFKA-14139>.
> > > > > > > It changes the Fetch and AlterPartition requests to include the
> > > > broker
> > > > > > > epochs. Then the controller can use these epochs to help
> reject the
> > > > stale
> > > > > > > AlterPartition request.
> > > > > > > Please take a look. Thanks!
> > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR
> > > > > > >
> > > > > >
> > > >
> > >
>

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

Reply via email to