Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

Calvin Liu Mon, 13 Feb 2023 11:53:29 -0800

Hi David.
Thanks for the feedback. I have updated the KIP as you suggested.


On Mon, Feb 13, 2023 at 7:34 AM David Jacot <dja...@confluent.io.invalid>
wrote:

> Hi Calvin,
>
> 01. Should we update the KIP to mention this?
>
> 09. The upgrade part is still not very clear in my opinion. It would be
> good to clearly mention how we are going to handle the upgrade of the
> AlterPartition API and the Fetch API. For the former, the broker uses the
> ApiVersions API to discover the supported versions of the controller so it
> will use the appropriate version based on this. For the later, we use the
> IBP/metadata version to gate the version of the fetch request used.
> Therefore, the version 14 will be used only when the IBP/metadata version
> is upgraded.
>
> Besides those two points, the KIP looks good to me.
>
> Thanks,
> David
>
> On Fri, Feb 10, 2023 at 7:47 PM Calvin Liu <ca...@confluent.io.invalid>
> wrote:
>
> > Hi David,
> > 01. It is a good idea to fence some delayed Fetch requests from the
> reboot
> > followers.
> > 06. Ack
> > 09. Rephrased the sentence to let the ZK controller ignore the broker
> > epoch.
> >
> > Thanks,
> > Calvin.
> >
> > On Fri, Feb 10, 2023 at 6:11 AM David Jacot <dja...@confluent.io.invalid
> >
> > wrote:
> >
> > > Hi Calvin,
> > >
> > > 01. Yeah, this should generally not happen. I was wondering if it could
> > > happen in the case of a partial failure of a broker for instance.
> > > Generally, I think that being defensive on the leader side does not
> hurt
> > > here. I am perhaps too extreme here.
> > >
> > > 03. NewIsr does not need to be ignorable because we will only set it
> when
> > > an old version is used.
> > >
> > > 06. It is a bit weird to have ReplicaId and BrokerId in the final
> > schema. I
> > > would remove ReplicaId. The comment is enough to explain the renaming.
> > >
> > > 09. Could you elaborate on "As the AlterPartition and Fetch requests
> are
> > > shared between ZK and Kraft mode, the related field will keep empty in
> ZK
> > > mode and will not be used."? I would do it the other way around, I
> think
> > > that the followers and the leader should always populare all the fields
> > but
> > > the ZK controller should not take it into account. It will be easier
> like
> > > this vs having to put gates everywhere.
> > >
> > > Best,
> > > David
> > >
> > > On Thu, Feb 9, 2023 at 7:27 PM Calvin Liu <ca...@confluent.io.invalid>
> > > wrote:
> > >
> > > > Hi Alexandre,
> > > > 104: I am not aware of any proposals around the up-to-date replica
> > > > election. It can be a good place to dig deeper.
> > > > 105: Yes, it almost puts nothing on the performance.
> > > >
> > > > Thanks,
> > > > Calvin
> > > >
> > > > On Thu, Feb 9, 2023 at 1:26 AM Alexandre Dupriez <
> > > > alexandre.dupr...@gmail.com> wrote:
> > > >
> > > > > Hi, Calvin,
> > > > >
> > > > > Many thanks for the replies.
> > > > >
> > > > > 103. Thanks for confirming the testing strategy. Exercising the
> right
> > > > > sequence of requests is a good idea.
> > > > >
> > > > > 104. Out of scope here, but the KIP reminds that currently, the
> > > > > controller does not attempt to detect data loss on the latest
> replica
> > > > > in the ISR and to choose the most up-to-date replica as a leader.
> Is
> > > > > this an area where the community wants to invest in building a
> > > > > potential solution?
> > > > >
> > > > > 105. I assume any performance impact of the change in Fetch
> requests
> > > > > is marginal if even noticeable at all.
> > > > >
> > > > >  Thanks,
> > > > > Alexandre
> > > > >
> > > > > Le mer. 8 févr. 2023 à 23:57, Calvin Liu
> <ca...@confluent.io.invalid
> > >
> > > a
> > > > > écrit :
> > > > > >
> > > > > > For Jun:
> > > > > > --- Since this KIP changes the inter-broker protocol, should we
> > bump
> > > up
> > > > > the
> > > > > > metadata version (the equivalent of IBP) during the upgrade?
> > > > > > Yes, we can.
> > > > > >
> > > > > > For Artem:
> > > > > > --- Could you elaborate on the behavior during rolls in the
> > > > Compatibility
> > > > > > section?
> > > > > > We can update the metadata version(IBP) to make sure the brokers
> > can
> > > > have
> > > > > > the latest code.
> > > > > > Also, during the IBP upgrade, the leader can just skip setting
> > broker
> > > > > epoch
> > > > > > or set -1 for the brokers that have not started using the new
> APIs.
> > > The
> > > > > > controller can bypass the check for such brokers so that the
> > cluster
> > > > can
> > > > > > still change ISR during the upgrade.
> > > > > >
> > > > > > --- Also for compatibility it's probably going to be easier to
> just
> > > > add a
> > > > > > new
> > > > > > array of epochs in addition to the existing array of broker ids,
> > > > instead
> > > > > of
> > > > > > removing one field and adding another.
> > > > > > We can do it both ways. Just prefer to replace it with a new
> field
> > > for
> > > > > > better readability.
> > > > > >
> > > > > > --- The KIP mentions that we would explicitly do something
> special
> > in
> > > > ZK
> > > > > > mode
> > > > > > in order to not implement new functionality.
> > > > > > It can be as simple as skipping setting the broker epoch if it is
> > in
> > > ZK
> > > > > > mode.
> > > > > >
> > > > > > For Alexandre:
> > > > > > 100: Yes, you are right, it does not require a controller
> > movement. I
> > > > > > removed the controller movement part when quoting the scenario
> from
> > > the
> > > > > > KAFKA-14139.
> > > > > > 101: In the current code, you can't remove the last replica in
> the
> > > ISR.
> > > > > > But, any damage to this replica will result in data loss.
> > > > > > 102: Yeah, I understand your concern. It is definitely great to
> > have
> > > a
> > > > > fix
> > > > > > in ZK, but considering the KAFKA-14139 is a rare case, it has
> > higher
> > > > ROE
> > > > > > for applying to just Kraft mode.
> > > > > > 103: I found that it is much easier to repro the bug in Kraft
> mode
> > by
> > > > > > feeding the controller with a given sequence of events/requests.
> So
> > > we
> > > > > may
> > > > > > just need a unit test to cover the case.
> > > > > >
> > > > > > For David:
> > > > > > 01:
> > > > > > Can you help explain why the follower can have a stale broker
> > epoch?
> > > Is
> > > > > it
> > > > > > because the registration request has any delay? But the broker
> will
> > > not
> > > > > > start fetching before the registration success.
> > > > > > On the other hand, if the follower fetches with the stale broker
> > > epoch,
> > > > > is
> > > > > > it good enough to ask the leader to hold to include this follower
> > > until
> > > > > the
> > > > > > fetch is consistent with the metadata cache?
> > > > > > 02: Ack
> > > > > > 03: I think NewIsrWithEpochs is not ignorable. For the deprecated
> > > > > > field NewIsr, what is the right way to make it ignorable? Just
> mark
> > > it
> > > > > > ignorable in this change?
> > > > > > 04: Ack
> > > > > > 05: Ack
> > > > > > 06: Ack
> > > > > > 07: Yes, will add it.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 8, 2023 at 6:52 AM David Jacot
> > > <dja...@confluent.io.invalid
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Calvin,
> > > > > > >
> > > > > > > Thanks for the KIP! The overall approach looks reasonable to
> me.
> > I
> > > > > have a
> > > > > > > few questions/comments:
> > > > > > >
> > > > > > > 01. I wonder if the leader should also verify the broker epochs
> > > based
> > > > > on
> > > > > > > its metadata cache before sending the AlterPartition request to
> > the
> > > > > > > controller. Imagine the case where a follower not in the ISR
> > would
> > > > keep
> > > > > > > fetching with a stale broker epoch but with a valid leader
> epoch.
> > > In
> > > > > this
> > > > > > > case, the leader would try to add it back to the ISR when the
> > > > follower
> > > > > > > catches up but the controller would reject it because its
> broker
> > > > epoch
> > > > > is
> > > > > > > stale. Until this condition resolves itself, the leader can't
> add
> > > any
> > > > > other
> > > > > > > followers back to the ISR because the ISR validation is all or
> > > > nothing
> > > > > at
> > > > > > > the moment. Letting the leader check before sending the
> > > > AlterPartition
> > > > > > > request would mitigate this. This condition must be rare but
> > could
> > > > > happen.
> > > > > > > We did this in KIP-841.
> > > > > > >
> > > > > > > 02. In AlterPartitionRequest/AlterPartitionResponse, you need
> to
> > > > > update the
> > > > > > > `validVersions` to `0-3`.
> > > > > > >
> > > > > > > 03. Personally, I like the `NewIsrWithEpochs` field in
> > > > > > > the AlterPartitionRequest. We can handle the backward
> > compatibility
> > > > in
> > > > > the
> > > > > > > request builder. Basically, we would always populate
> > > NewIsrWithEpochs
> > > > > and
> > > > > > > the request builder can translate it to NewIsr if an earlier
> > > version
> > > > is
> > > > > > > used. Should the field be ignorable?
> > > > > > >
> > > > > > > 04. Should NewIsrWithEpochs.BrokerEpoch have -1 as default?
> > > > > > >
> > > > > > > 05. In FetchRequest/FetchResponse, you need to update the
> > > > > `validVersions`
> > > > > > > as well.
> > > > > > >
> > > > > > > 06. It is a little weird to have ReplicaId and BrokerEpoch in
> the
> > > > > > > FetchRequest. I wonder if we should rename ReplicaId to
> BrokerId
> > > > > because it
> > > > > > > is actually the broker id (even the documentation of the field
> > says
> > > > > it).
> > > > > > >
> > > > > > > 07. On the followers, the fetch request version is derived from
> > the
> > > > > > > metadata version/IBP. As we add a new fetch version, we need to
> > > add a
> > > > > new
> > > > > > > metadata version/IBP as well.
> > > > > > >
> > > > > > > 08. Regarding KRaft vs ZK, I don't have a strong opinion. ZK is
> > on
> > > > the
> > > > > way
> > > > > > > out so not doing it seems fine. If we do this, we could
> basically
> > > > just
> > > > > > > ignore the broker epoch in ZK and it will keep working as
> today.
> > > > > > >
> > > > > > > Best,
> > > > > > > David
> > > > > > >
> > > > > > > On Wed, Feb 8, 2023 at 3:01 PM Alexandre Dupriez <
> > > > > > > alexandre.dupr...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi, Calvin,
> > > > > > > >
> > > > > > > > Thanks for the KIP and fast follow-up. A few questions.
> > > > > > > >
> > > > > > > > 100. The scenario illustrated in the KIP involves a
> controller
> > > > > > > > movement. Is this really required? Cannot the scenario occur
> > > with a
> > > > > > > > similar stale AlterPartition request and the same controller
> > > > > > > > throughout?
> > > > > > > >
> > > > > > > > 101. In the case where card(ISR) = 1 and the last replica
> > leaves,
> > > > it
> > > > > > > > will be re-elected as the leader upon reconnection. If the
> > > replica
> > > > is
> > > > > > > > empty, all data for the partition will be lost. Is this a
> > correct
> > > > > > > > understanding of the scenario?
> > > > > > > >
> > > > > > > > 102. I understand that ZK is going to be unsupported soon.
> > > However
> > > > > for
> > > > > > > > as long as it is available to end users, is there any reason
> > not
> > > to
> > > > > > > > support the fix in ZK mode? Arguably, the implementation for
> > the
> > > > > logic
> > > > > > > > to AlterPartition is duplicated for both controller types,
> and
> > it
> > > > may
> > > > > > > > be more work than is worth if ZK is fully decommissioned in
> the
> > > > next
> > > > > > > > few months. (Alternatively, is there a plan to back port the
> > fix
> > > to
> > > > > > > > older minor versions?).
> > > > > > > >
> > > > > > > > 103. The KIP mentions system tests to be used to simulate the
> > > race
> > > > > > > > condition. Would it be possible to provide more details about
> > it?
> > > > Do
> > > > > > > > we think it worth having this scenario be exercised in the
> > > > functional
> > > > > > > > tests of the core/server module?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Alexandre
> > > > > > > >
> > > > > > > > Le mer. 8 févr. 2023 à 03:31, Artem Livshits
> > > > > > > > <alivsh...@confluent.io.invalid> a écrit :
> > > > > > > > >
> > > > > > > > > Hi Calvin,
> > > > > > > > >
> > > > > > > > > Thank you for the KIP.  I have a similar question -- we
> need
> > to
> > > > > support
> > > > > > > > > rolling upgrades (when we have some old brokers and some
> new
> > > > > brokers),
> > > > > > > so
> > > > > > > > > there could be combinations of old leader - new follower,
> new
> > > > > leader -
> > > > > > > > old
> > > > > > > > > follower, new leader - old controller, old leader - new
> > > > controller.
> > > > > > > > Could
> > > > > > > > > you elaborate on the behavior during rolls in the
> > Compatibility
> > > > > > > section?
> > > > > > > > >
> > > > > > > > > Also for compatibility it's probably going to be easier to
> > just
> > > > > add a
> > > > > > > new
> > > > > > > > > array of epochs in addition to the existing array of broker
> > > ids,
> > > > > > > instead
> > > > > > > > of
> > > > > > > > > removing one field and adding another.
> > > > > > > > >
> > > > > > > > > The KIP mentions that we would explicitly do something
> > special
> > > in
> > > > > ZK
> > > > > > > mode
> > > > > > > > > in order to not implement new functionality.  I think it
> may
> > be
> > > > > easier
> > > > > > > to
> > > > > > > > > implement functionality for both ZK and KRraft mode than
> > adding
> > > > > code to
> > > > > > > > > disable it in ZK mode.
> > > > > > > > >
> > > > > > > > > -Artem
> > > > > > > > >
> > > > > > > > > On Tue, Feb 7, 2023 at 4:58 PM Jun Rao
> > > <j...@confluent.io.invalid
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi, Calvin,
> > > > > > > > > >
> > > > > > > > > > Thanks for the KIP. Looks good to me overall.
> > > > > > > > > >
> > > > > > > > > > Since this KIP changes the inter-broker protocol, should
> we
> > > > bump
> > > > > up
> > > > > > > the
> > > > > > > > > > metadata version (the equivalent of IBP) during upgrade?
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Feb 3, 2023 at 10:55 AM Calvin Liu
> > > > > > > <ca...@confluent.io.invalid
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi everyone,
> > > > > > > > > > > I'd like to discuss the fix for the broker reboot data
> > loss
> > > > > > > > KAFKA-14139
> > > > > > > > > > > <https://issues.apache.org/jira/browse/KAFKA-14139>.
> > > > > > > > > > > It changes the Fetch and AlterPartition requests to
> > include
> > > > the
> > > > > > > > broker
> > > > > > > > > > > epochs. Then the controller can use these epochs to
> help
> > > > > reject the
> > > > > > > > stale
> > > > > > > > > > > AlterPartition request.
> > > > > > > > > > > Please take a look. Thanks!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-903: Replicas with stale broker epoch should not be allowed to join the ISR

Reply via email to