Re: [DISCUSS] KIP-1014: Managing Unstable Metadata Versions in Apache Kafka

Artem Livshits Thu, 11 Jan 2024 13:43:33 -0800

Hi Proven,

I'd say that we should do 2 & 2.  The idea is that for small features that
can be done and stabilized within a short period of time (with one or very
few commits) that's exactly what happens -- people interested in testing
in-progress feature could take unstable code from a patch (or private
branch / fork) with the expectation that that private code could create a
state that will not be compatible with anything (or may be completely
broken for that matter -- in the end of the day it's a functionality that
may not be fully tested or even fully implemented); and once the feature is
stable it goes to trunk it is fully committed there, if the bugs are found
they'd get fixed "forward".  The 2 & 2 option pretty much extends this to
large features -- if a feature is above stable MV, then going above it is
like getting some in-progress code for early testing with the expectation
that something may not fully work or leave system in upgradable state;
promoting a feature into a state MV would come with the expectation that
the feature gets fully committed and any bugs will be fixed "forward".


-Artem

On Thu, Jan 11, 2024 at 10:16 AM Proven Provenzano
<pprovenz...@confluent.io.invalid> wrote:

> We have two approaches here for how we update unstable metadata versions.
>
>    1. The update will only increase MVs of unstable features to a value
>    greater than the new stable feature. The idea is that a specific
> unstable
>    MV may support some set of features and in the future that set is
> always a
>    strict subset of the current set. The issue is that moving a feature to
>    make way for a stable feature with a higher MV will leave holes.
>    2. We are free to reorder the MV for any unstable feature. This removes
>    the hole issue, but does make the unstable MVs more muddled. There isn't
>    the same binary state for a MV where a feature is available or there is
> a
>    hole.
>
>
> We also have two ends of the spectrum as to when we update the stable MV.
>
>    1. We update at release points which reduces the amount of churn of the
>    unstable MVs and makes a stronger correlation between accepted features
> and
>    stable MVs for a release but means less testing on trunk as a stable MV.
>    2. We update when the developers of a feature think it is done. This
>    leads to features being available for more testing in trunk but forces
> the
>    next release to include it as stable.
>
>
> I'd like more feedback from others on these two dimensions.
> --Proven
>
>
>
> On Wed, Jan 10, 2024 at 12:16 PM Justine Olshan
> <jols...@confluent.io.invalid> wrote:
>
> > Hmm it seems like Colin and Proven are disagreeing with whether we can
> swap
> > unstable metadata versions.
> >
> > >  When we reorder, we are always allocating a new MV and we are never
> > reusing an existing MV even if it was also unstable.
> >
> > > Given that this is true, there's no reason to have special rules about
> > what we can and can't do with unstable MVs. We can do anything
> >
> > I don't have a strong preference either way, but I think we should agree
> on
> > one approach.
> > The benefit of reordering and reusing is that we can release features
> that
> > are ready earlier and we have more flexibility. With the approach where
> we
> > always create a new MV, I am concerned with having many "empty" MVs. This
> > would encourage waiting until the release before we decide an incomplete
> > feature is not ready and moving its MV into the future. (The
> > abandoning comment I made earlier -- that is consistent with Proven's
> > approach)
> >
> > I think the only potential issue with reordering is that it could be a
> bit
> > confusing and *potentially *prone to errors. Note I say potentially
> because
> > I think it depends on folks' understanding with this new unstable
> metadata
> > version concept. I echo Federico's comments about making sure the risks
> are
> > highlighted.
> >
> > Thanks,
> >
> > Justine
> >
> > On Wed, Jan 10, 2024 at 1:16 AM Federico Valeri <fedeval...@gmail.com>
> > wrote:
> >
> > > Hi folks,
> > >
> > > > If you use an unstable MV, you probably won't be able to upgrade your
> > > software. Because whenever something changes, you'll probably get
> > > serialization exceptions being thrown inside the controller. Fatal
> ones.
> > >
> > > Thanks for this clarification. I think this concrete risk should be
> > > highlighted in the KIP and in the "unstable.metadata.versions.enable"
> > > documentation.
> > >
> > > In the test plan, should we also have one system test checking that
> > > "features with a stable MV will never have that MV changed"?
> > >
> > > On Wed, Jan 10, 2024 at 8:16 AM Colin McCabe <cmcc...@apache.org>
> wrote:
> > > >
> > > > On Tue, Jan 9, 2024, at 18:56, Proven Provenzano wrote:
> > > > > Hi folks,
> > > > >
> > > > > Thank you for the questions.
> > > > >
> > > > > Let me clarify about reorder first. The reorder of unstable
> metadata
> > > > > versions should be infrequent.
> > > >
> > > > Why does it need to be infrequent? We should be able to reorder
> > unstable
> > > metadata versions as often as we like. There are no guarantees about
> > > unstable MVs.
> > > >
> > > > > The time you reorder is when a feature that
> > > > > requires a higher metadata version to enable becomes "production
> > > ready" and
> > > > > the features with unstable metadata versions less than the new
> stable
> > > one
> > > > > are moved to metadata versions greater than the new stable feature.
> > > When we
> > > > > reorder, we are always allocating a new MV and we are never reusing
> > an
> > > > > existing MV even if it was also unstable. This way a developer
> > > upgrading
> > > > > their environment with a specific unstable MV might see existing
> > > > > functionality stop working but they won't see new MV dependent
> > > > > functionality magically appear. The feature set for a given
> unstable
> > MV
> > > > > version can only decrease with reordering.
> > > >
> > > > If you use an unstable MV, you probably won't be able to upgrade your
> > > software. Because whenever something changes, you'll probably get
> > > serialization exceptions being thrown inside the controller. Fatal
> ones.
> > > >
> > > > Given that this is true, there's no reason to have special rules
> about
> > > what we can and can't do with unstable MVs. We can do anything.
> > > >
> > > > >
> > > > > How do we define "production ready" and when should we bump
> > > > > LATEST_PRODUCTION? I would like to define it to be the point where
> > the
> > > > > feature is code complete with tests and the KIP for it is approved.
> > > However
> > > > > even with this definition if the feature later develops a major
> issue
> > > it
> > > > > could still block future features until the issue is fixed which is
> > > what we
> > > > > are trying to avoid here. We could be much more formal about this
> and
> > > let
> > > > > the release manager for a release define what is stable for a given
> > > release
> > > > > and then do the bump just after the branch is created on the
> branch.
> > > When
> > > > > an RC candidate is accepted, the bump would be backported. I would
> > > like to
> > > > > hear other ideas here.
> > > > >
> > > >
> > > > Yeah, it's an interesting question. Overall, I think developers
> should
> > > define when a feature is production ready.
> > > >
> > > > The question to ask is, "are you ready to take this feature to
> > > production in your workplace?" I think most developers do have a sense
> of
> > > this. Obviously bugs and mistakes can happen, but I think this standard
> > > would avoid most of the issues that we're trying to avoid by having
> > > unstable MVs in the first place.
> > > >
> > > > ELR is a good example. Nobody would have said that it was production
> > > ready in 3.7 ... hence it belonged (and still belongs) in an unstable
> MV,
> > > until that changes (hopefully soon :) )
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > > --Proven
> > > > >
> > > > > On Tue, Jan 9, 2024 at 3:26 PM Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > > >
> > > > >> Hi Justine,
> > > > >>
> > > > >> Yes, this is an important point to clarify. Proven can comment
> more,
> > > but
> > > > >> my understanding is that we can do anything to unstable metadata
> > > versions.
> > > > >> Reorder them, delete them, change them in any other way. There are
> > no
> > > > >> stability guarantees. If the current text is unclear let's add
> more
> > > > >> examples of what we can do (which is anything) :)
> > > > >>
> > > > >> best,
> > > > >> Colin
> > > > >>
> > > > >>
> > > > >> On Mon, Jan 8, 2024, at 14:18, Justine Olshan wrote:
> > > > >> > Hey Colin,
> > > > >> >
> > > > >> > I had some offline discussions with Proven previously and it
> seems
> > > like
> > > > >> he
> > > > >> > said something different so I'm glad I brought it up here.
> > > > >> >
> > > > >> > Let's clarify if we are ok with reordering unstable metadata
> > > versions :)
> > > > >> >
> > > > >> > Justine
> > > > >> >
> > > > >> > On Mon, Jan 8, 2024 at 1:56 PM Colin McCabe <cmcc...@apache.org
> >
> > > wrote:
> > > > >> >
> > > > >> >> On Mon, Jan 8, 2024, at 13:19, Justine Olshan wrote:
> > > > >> >> > Hey all,
> > > > >> >> >
> > > > >> >> > I was wondering how often we plan to update LATEST_PRODUCTION
> > > metadata
> > > > >> >> > version. Is this something we should do as soon as the
> feature
> > is
> > > > >> >> complete
> > > > >> >> > or something we do when we are releasing kafka. When is the
> > time
> > > we
> > > > >> >> abandon
> > > > >> >> > a MV so that other features can be unblocked?
> > > > >> >>
> > > > >> >> Hi Justine,
> > > > >> >>
> > > > >> >> Thanks for reviewing.
> > > > >> >>
> > > > >> >> The idea is that you should bump LATEST_PRODUCTION when you
> want
> > to
> > > > >> take a
> > > > >> >> feature to production. That could mean deploying it internally
> > > > >> somewhere to
> > > > >> >> production, or doing an Apache release that lets everyone
> deploy
> > > the
> > > > >> thing
> > > > >> >> to production.
> > > > >> >>
> > > > >> >> Not in production? No need to care about this. Make any changes
> > you
> > > > >> like.
> > > > >> >>
> > > > >> >> As a corollary, we should keep the LATEST_PRODUCTION version as
> > > low as
> > > > >> it
> > > > >> >> can be. If you haven't tested the feature, don't freeze it in
> > > stone yet.
> > > > >> >>
> > > > >> >> >
> > > > >> >> > I am just considering a feature that may end up missing a
> > > release. It
> > > > >> >> seems
> > > > >> >> > like maybe that MV would block future metadata versions until
> > we
> > > > >> decide
> > > > >> >> the
> > > > >> >> > feature won't make the cut. From that point, all "ready"
> > features
> > > > >> should
> > > > >> >> be
> > > > >> >> > able to be released.
> > > > >> >>
> > > > >> >> The intention is the opposite. A feature in an unstable
> metadata
> > > version
> > > > >> >> doesn't block anything. You can always move a feature from one
> > > unstable
> > > > >> >> metadata version to another if the feature starts taking too
> long
> > > to
> > > > >> finish.
> > > > >> >>
> > > > >> >> > I'm also wondering if the KIP should include some information
> > > about
> > > > >> how a
> > > > >> >> > metadata should be abandoned. Maybe there is a specific
> message
> > > to
> > > > >> write
> > > > >> >> in
> > > > >> >> > the file? So folks who were maybe waiting on that version
> know
> > > they
> > > > >> can
> > > > >> >> > release their feature?
> > > > >> >> >
> > > > >> >> > I am also assuming that we don't shift all the waiting
> metadata
> > > > >> versions
> > > > >> >> > when we abandon a version, but it would be good to clarify
> and
> > > > >> include in
> > > > >> >> > the KIP.
> > > > >> >>
> > > > >> >> I'm not sure what you mean by abandoning a version. We never
> > > abandon a
> > > > >> >> version once it's stable.
> > > > >> >>
> > > > >> >> Unstable versions can change. I wouldn't describe this as
> > > "abandonment",
> > > > >> >> just the MV changing prior to release.
> > > > >> >>
> > > > >> >> In a similar way, the contents of the 3.7 branch will change up
> > > until
> > > > >> >> 3.7.0 is released. Once it gets released, it's never
> unreleased.
> > > We just
> > > > >> >> move on to 3.7.1. Same thing here.
> > > > >> >>
> > > > >> >> best,
> > > > >> >> Colin
> > > > >> >>
> > > > >> >> >
> > > > >> >> > Thanks,
> > > > >> >> >
> > > > >> >> > Justine
> > > > >> >> >
> > > > >> >> > On Mon, Jan 8, 2024 at 12:44 PM Colin McCabe <
> > cmcc...@apache.org
> > > >
> > > > >> wrote:
> > > > >> >> >
> > > > >> >> >> Hi Proven,
> > > > >> >> >>
> > > > >> >> >> Thanks for the KIP. I think there is a need for this
> > > capability, for
> > > > >> >> those
> > > > >> >> >> of us who deploy from trunk (or branches dervied from
> trunk).
> > > > >> >> >>
> > > > >> >> >> With regard to "unstable.metadata.versions.enable": is this
> > > going to
> > > > >> be
> > > > >> >> a
> > > > >> >> >> documented configuration, or an internal one? I am guessing
> we
> > > want
> > > > >> it
> > > > >> >> to
> > > > >> >> >> be documented, so that users can use it. If we do, we should
> > > probably
> > > > >> >> also
> > > > >> >> >> very prominently warn that THIS WILL BREAK UPGRADES FOR YOUR
> > > CLUSTER.
> > > > >> >> That
> > > > >> >> >> includes logging an ERROR message on startup, etc.
> > > > >> >> >>
> > > > >> >> >> It would be good to document if a release can go out that
> > > contains
> > > > >> >> "future
> > > > >> >> >> MVs" that are unstable. Like can we make a 3.8 release that
> > > contains
> > > > >> >> >> IBP_4_0_IV0 in MetadataVersion.java, as an unstable future
> MV?
> > > > >> >> Personally I
> > > > >> >> >> think the answer should be "yes," but with the usual
> caveats.
> > > When
> > > > >> the
> > > > >> >> >> actual 4.0 comes out, the unstable 4.0 MV that shipped in
> 3.8
> > > > >> probably
> > > > >> >> >> won't work, and you won't be able to upgrade. (It was
> > unstable,
> > > we
> > > > >> told
> > > > >> >> you
> > > > >> >> >> not to use it.)
> > > > >> >> >>
> > > > >> >> >> best,
> > > > >> >> >> Colin
> > > > >> >> >>
> > > > >> >> >>
> > > > >> >> >> On Fri, Jan 5, 2024, at 07:32, Proven Provenzano wrote:
> > > > >> >> >> > Hey folks,
> > > > >> >> >> >
> > > > >> >> >> > I am starting a discussion thread for managing unstable
> > > metadata
> > > > >> >> >> > versions
> > > > >> >> >> > in Apache Kafka.
> > > > >> >> >> >
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1014%3A+Managing+Unstable+Metadata+Versions+in+Apache+Kafka
> > > > >> >> >> >
> > > > >> >> >> > This KIP is actually already implemented in 3.7 with PR
> > > > >> >> >> > https://github.com/apache/kafka/pull/14860.
> > > > >> >> >> > I have created this KIP to explain the motivation and how
> > > managing
> > > > >> >> >> Metadata
> > > > >> >> >> > Versions is expected to work.
> > > > >> >> >> > Comments are greatly appreciated as this process can
> always
> > be
> > > > >> >> improved.
> > > > >> >> >> >
> > > > >> >> >> > --
> > > > >> >> >> > --Proven
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > >
> >
>

Re: [DISCUSS] KIP-1014: Managing Unstable Metadata Versions in Apache Kafka

Reply via email to