Re: [DISCUSS] KIP-1014: Managing Unstable Metadata Versions in Apache Kafka

Andrew Schofield Fri, 12 Jan 2024 01:21:31 -0800

I also agree with 2 & 2 with reasoning along the same lines as Artem.

Thanks,
Andrew


> On 12 Jan 2024, at 09:15, Federico Valeri <fedeval...@gmail.com> wrote:
> 
> On Thu, Jan 11, 2024 at 10:43 PM Artem Livshits
> <alivsh...@confluent.io.invalid> wrote:
>> 
>> Hi Proven,
>> 
>> I'd say that we should do 2 & 2.  The idea is that for small features that
>> can be done and stabilized within a short period of time (with one or very
>> few commits) that's exactly what happens -- people interested in testing
>> in-progress feature could take unstable code from a patch (or private
>> branch / fork) with the expectation that that private code could create a
>> state that will not be compatible with anything (or may be completely
>> broken for that matter -- in the end of the day it's a functionality that
>> may not be fully tested or even fully implemented); and once the feature is
>> stable it goes to trunk it is fully committed there, if the bugs are found
>> they'd get fixed "forward".
> 
> I agree with this reasoning.
> 
>> The 2 & 2 option pretty much extends this to
>> large features -- if a feature is above stable MV, then going above it is
>> like getting some in-progress code for early testing with the expectation
>> that something may not fully work or leave system in upgradable state;
> 
> Usually I expect that an early access feature may not fully work, but
> not that it could affect upgrades. I think this is less obvious,
> that's why I asked to document clearly.
> 
>> promoting a feature into a state MV would come with the expectation that
>> the feature gets fully committed and any bugs will be fixed "forward".
>> 
>> -Artem
>> 
>> On Thu, Jan 11, 2024 at 10:16 AM Proven Provenzano
>> <pprovenz...@confluent.io.invalid> wrote:
>> 
>>> We have two approaches here for how we update unstable metadata versions.
>>> 
>>>   1. The update will only increase MVs of unstable features to a value
>>>   greater than the new stable feature. The idea is that a specific
>>> unstable
>>>   MV may support some set of features and in the future that set is
>>> always a
>>>   strict subset of the current set. The issue is that moving a feature to
>>>   make way for a stable feature with a higher MV will leave holes.
>>>   2. We are free to reorder the MV for any unstable feature. This removes
>>>   the hole issue, but does make the unstable MVs more muddled. There isn't
>>>   the same binary state for a MV where a feature is available or there is
>>> a
>>>   hole.
>>> 
>>> 
>>> We also have two ends of the spectrum as to when we update the stable MV.
>>> 
>>>   1. We update at release points which reduces the amount of churn of the
>>>   unstable MVs and makes a stronger correlation between accepted features
>>> and
>>>   stable MVs for a release but means less testing on trunk as a stable MV.
>>>   2. We update when the developers of a feature think it is done. This
>>>   leads to features being available for more testing in trunk but forces
>>> the
>>>   next release to include it as stable.
>>> 
>>> 
>>> I'd like more feedback from others on these two dimensions.
>>> --Proven
>>> 
>>> 
>>> 
>>> On Wed, Jan 10, 2024 at 12:16 PM Justine Olshan
>>> <jols...@confluent.io.invalid> wrote:
>>> 
>>>> Hmm it seems like Colin and Proven are disagreeing with whether we can
>>> swap
>>>> unstable metadata versions.
>>>> 
>>>>> When we reorder, we are always allocating a new MV and we are never
>>>> reusing an existing MV even if it was also unstable.
>>>> 
>>>>> Given that this is true, there's no reason to have special rules about
>>>> what we can and can't do with unstable MVs. We can do anything
>>>> 
>>>> I don't have a strong preference either way, but I think we should agree
>>> on
>>>> one approach.
>>>> The benefit of reordering and reusing is that we can release features
>>> that
>>>> are ready earlier and we have more flexibility. With the approach where
>>> we
>>>> always create a new MV, I am concerned with having many "empty" MVs. This
>>>> would encourage waiting until the release before we decide an incomplete
>>>> feature is not ready and moving its MV into the future. (The
>>>> abandoning comment I made earlier -- that is consistent with Proven's
>>>> approach)
>>>> 
>>>> I think the only potential issue with reordering is that it could be a
>>> bit
>>>> confusing and *potentially *prone to errors. Note I say potentially
>>> because
>>>> I think it depends on folks' understanding with this new unstable
>>> metadata
>>>> version concept. I echo Federico's comments about making sure the risks
>>> are
>>>> highlighted.
>>>> 
>>>> Thanks,
>>>> 
>>>> Justine
>>>> 
>>>> On Wed, Jan 10, 2024 at 1:16 AM Federico Valeri <fedeval...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi folks,
>>>>> 
>>>>>> If you use an unstable MV, you probably won't be able to upgrade your
>>>>> software. Because whenever something changes, you'll probably get
>>>>> serialization exceptions being thrown inside the controller. Fatal
>>> ones.
>>>>> 
>>>>> Thanks for this clarification. I think this concrete risk should be
>>>>> highlighted in the KIP and in the "unstable.metadata.versions.enable"
>>>>> documentation.
>>>>> 
>>>>> In the test plan, should we also have one system test checking that
>>>>> "features with a stable MV will never have that MV changed"?
>>>>> 
>>>>> On Wed, Jan 10, 2024 at 8:16 AM Colin McCabe <cmcc...@apache.org>
>>> wrote:
>>>>>> 
>>>>>> On Tue, Jan 9, 2024, at 18:56, Proven Provenzano wrote:
>>>>>>> Hi folks,
>>>>>>> 
>>>>>>> Thank you for the questions.
>>>>>>> 
>>>>>>> Let me clarify about reorder first. The reorder of unstable
>>> metadata
>>>>>>> versions should be infrequent.
>>>>>> 
>>>>>> Why does it need to be infrequent? We should be able to reorder
>>>> unstable
>>>>> metadata versions as often as we like. There are no guarantees about
>>>>> unstable MVs.
>>>>>> 
>>>>>>> The time you reorder is when a feature that
>>>>>>> requires a higher metadata version to enable becomes "production
>>>>> ready" and
>>>>>>> the features with unstable metadata versions less than the new
>>> stable
>>>>> one
>>>>>>> are moved to metadata versions greater than the new stable feature.
>>>>> When we
>>>>>>> reorder, we are always allocating a new MV and we are never reusing
>>>> an
>>>>>>> existing MV even if it was also unstable. This way a developer
>>>>> upgrading
>>>>>>> their environment with a specific unstable MV might see existing
>>>>>>> functionality stop working but they won't see new MV dependent
>>>>>>> functionality magically appear. The feature set for a given
>>> unstable
>>>> MV
>>>>>>> version can only decrease with reordering.
>>>>>> 
>>>>>> If you use an unstable MV, you probably won't be able to upgrade your
>>>>> software. Because whenever something changes, you'll probably get
>>>>> serialization exceptions being thrown inside the controller. Fatal
>>> ones.
>>>>>> 
>>>>>> Given that this is true, there's no reason to have special rules
>>> about
>>>>> what we can and can't do with unstable MVs. We can do anything.
>>>>>> 
>>>>>>> 
>>>>>>> How do we define "production ready" and when should we bump
>>>>>>> LATEST_PRODUCTION? I would like to define it to be the point where
>>>> the
>>>>>>> feature is code complete with tests and the KIP for it is approved.
>>>>> However
>>>>>>> even with this definition if the feature later develops a major
>>> issue
>>>>> it
>>>>>>> could still block future features until the issue is fixed which is
>>>>> what we
>>>>>>> are trying to avoid here. We could be much more formal about this
>>> and
>>>>> let
>>>>>>> the release manager for a release define what is stable for a given
>>>>> release
>>>>>>> and then do the bump just after the branch is created on the
>>> branch.
>>>>> When
>>>>>>> an RC candidate is accepted, the bump would be backported. I would
>>>>> like to
>>>>>>> hear other ideas here.
>>>>>>> 
>>>>>> 
>>>>>> Yeah, it's an interesting question. Overall, I think developers
>>> should
>>>>> define when a feature is production ready.
>>>>>> 
>>>>>> The question to ask is, "are you ready to take this feature to
>>>>> production in your workplace?" I think most developers do have a sense
>>> of
>>>>> this. Obviously bugs and mistakes can happen, but I think this standard
>>>>> would avoid most of the issues that we're trying to avoid by having
>>>>> unstable MVs in the first place.
>>>>>> 
>>>>>> ELR is a good example. Nobody would have said that it was production
>>>>> ready in 3.7 ... hence it belonged (and still belongs) in an unstable
>>> MV,
>>>>> until that changes (hopefully soon :) )
>>>>>> 
>>>>>> best,
>>>>>> Colin
>>>>>> 
>>>>>>> --Proven
>>>>>>> 
>>>>>>> On Tue, Jan 9, 2024 at 3:26 PM Colin McCabe <cmcc...@apache.org>
>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Justine,
>>>>>>>> 
>>>>>>>> Yes, this is an important point to clarify. Proven can comment
>>> more,
>>>>> but
>>>>>>>> my understanding is that we can do anything to unstable metadata
>>>>> versions.
>>>>>>>> Reorder them, delete them, change them in any other way. There are
>>>> no
>>>>>>>> stability guarantees. If the current text is unclear let's add
>>> more
>>>>>>>> examples of what we can do (which is anything) :)
>>>>>>>> 
>>>>>>>> best,
>>>>>>>> Colin
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jan 8, 2024, at 14:18, Justine Olshan wrote:
>>>>>>>>> Hey Colin,
>>>>>>>>> 
>>>>>>>>> I had some offline discussions with Proven previously and it
>>> seems
>>>>> like
>>>>>>>> he
>>>>>>>>> said something different so I'm glad I brought it up here.
>>>>>>>>> 
>>>>>>>>> Let's clarify if we are ok with reordering unstable metadata
>>>>> versions :)
>>>>>>>>> 
>>>>>>>>> Justine
>>>>>>>>> 
>>>>>>>>> On Mon, Jan 8, 2024 at 1:56 PM Colin McCabe <cmcc...@apache.org
>>>> 
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> On Mon, Jan 8, 2024, at 13:19, Justine Olshan wrote:
>>>>>>>>>>> Hey all,
>>>>>>>>>>> 
>>>>>>>>>>> I was wondering how often we plan to update LATEST_PRODUCTION
>>>>> metadata
>>>>>>>>>>> version. Is this something we should do as soon as the
>>> feature
>>>> is
>>>>>>>>>> complete
>>>>>>>>>>> or something we do when we are releasing kafka. When is the
>>>> time
>>>>> we
>>>>>>>>>> abandon
>>>>>>>>>>> a MV so that other features can be unblocked?
>>>>>>>>>> 
>>>>>>>>>> Hi Justine,
>>>>>>>>>> 
>>>>>>>>>> Thanks for reviewing.
>>>>>>>>>> 
>>>>>>>>>> The idea is that you should bump LATEST_PRODUCTION when you
>>> want
>>>> to
>>>>>>>> take a
>>>>>>>>>> feature to production. That could mean deploying it internally
>>>>>>>> somewhere to
>>>>>>>>>> production, or doing an Apache release that lets everyone
>>> deploy
>>>>> the
>>>>>>>> thing
>>>>>>>>>> to production.
>>>>>>>>>> 
>>>>>>>>>> Not in production? No need to care about this. Make any changes
>>>> you
>>>>>>>> like.
>>>>>>>>>> 
>>>>>>>>>> As a corollary, we should keep the LATEST_PRODUCTION version as
>>>>> low as
>>>>>>>> it
>>>>>>>>>> can be. If you haven't tested the feature, don't freeze it in
>>>>> stone yet.
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I am just considering a feature that may end up missing a
>>>>> release. It
>>>>>>>>>> seems
>>>>>>>>>>> like maybe that MV would block future metadata versions until
>>>> we
>>>>>>>> decide
>>>>>>>>>> the
>>>>>>>>>>> feature won't make the cut. From that point, all "ready"
>>>> features
>>>>>>>> should
>>>>>>>>>> be
>>>>>>>>>>> able to be released.
>>>>>>>>>> 
>>>>>>>>>> The intention is the opposite. A feature in an unstable
>>> metadata
>>>>> version
>>>>>>>>>> doesn't block anything. You can always move a feature from one
>>>>> unstable
>>>>>>>>>> metadata version to another if the feature starts taking too
>>> long
>>>>> to
>>>>>>>> finish.
>>>>>>>>>> 
>>>>>>>>>>> I'm also wondering if the KIP should include some information
>>>>> about
>>>>>>>> how a
>>>>>>>>>>> metadata should be abandoned. Maybe there is a specific
>>> message
>>>>> to
>>>>>>>> write
>>>>>>>>>> in
>>>>>>>>>>> the file? So folks who were maybe waiting on that version
>>> know
>>>>> they
>>>>>>>> can
>>>>>>>>>>> release their feature?
>>>>>>>>>>> 
>>>>>>>>>>> I am also assuming that we don't shift all the waiting
>>> metadata
>>>>>>>> versions
>>>>>>>>>>> when we abandon a version, but it would be good to clarify
>>> and
>>>>>>>> include in
>>>>>>>>>>> the KIP.
>>>>>>>>>> 
>>>>>>>>>> I'm not sure what you mean by abandoning a version. We never
>>>>> abandon a
>>>>>>>>>> version once it's stable.
>>>>>>>>>> 
>>>>>>>>>> Unstable versions can change. I wouldn't describe this as
>>>>> "abandonment",
>>>>>>>>>> just the MV changing prior to release.
>>>>>>>>>> 
>>>>>>>>>> In a similar way, the contents of the 3.7 branch will change up
>>>>> until
>>>>>>>>>> 3.7.0 is released. Once it gets released, it's never
>>> unreleased.
>>>>> We just
>>>>>>>>>> move on to 3.7.1. Same thing here.
>>>>>>>>>> 
>>>>>>>>>> best,
>>>>>>>>>> Colin
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Justine
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Jan 8, 2024 at 12:44 PM Colin McCabe <
>>>> cmcc...@apache.org
>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Proven,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for the KIP. I think there is a need for this
>>>>> capability, for
>>>>>>>>>> those
>>>>>>>>>>>> of us who deploy from trunk (or branches dervied from
>>> trunk).
>>>>>>>>>>>> 
>>>>>>>>>>>> With regard to "unstable.metadata.versions.enable": is this
>>>>> going to
>>>>>>>> be
>>>>>>>>>> a
>>>>>>>>>>>> documented configuration, or an internal one? I am guessing
>>> we
>>>>> want
>>>>>>>> it
>>>>>>>>>> to
>>>>>>>>>>>> be documented, so that users can use it. If we do, we should
>>>>> probably
>>>>>>>>>> also
>>>>>>>>>>>> very prominently warn that THIS WILL BREAK UPGRADES FOR YOUR
>>>>> CLUSTER.
>>>>>>>>>> That
>>>>>>>>>>>> includes logging an ERROR message on startup, etc.
>>>>>>>>>>>> 
>>>>>>>>>>>> It would be good to document if a release can go out that
>>>>> contains
>>>>>>>>>> "future
>>>>>>>>>>>> MVs" that are unstable. Like can we make a 3.8 release that
>>>>> contains
>>>>>>>>>>>> IBP_4_0_IV0 in MetadataVersion.java, as an unstable future
>>> MV?
>>>>>>>>>> Personally I
>>>>>>>>>>>> think the answer should be "yes," but with the usual
>>> caveats.
>>>>> When
>>>>>>>> the
>>>>>>>>>>>> actual 4.0 comes out, the unstable 4.0 MV that shipped in
>>> 3.8
>>>>>>>> probably
>>>>>>>>>>>> won't work, and you won't be able to upgrade. (It was
>>>> unstable,
>>>>> we
>>>>>>>> told
>>>>>>>>>> you
>>>>>>>>>>>> not to use it.)
>>>>>>>>>>>> 
>>>>>>>>>>>> best,
>>>>>>>>>>>> Colin
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Jan 5, 2024, at 07:32, Proven Provenzano wrote:
>>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am starting a discussion thread for managing unstable
>>>>> metadata
>>>>>>>>>>>>> versions
>>>>>>>>>>>>> in Apache Kafka.
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1014%3A+Managing+Unstable+Metadata+Versions+in+Apache+Kafka
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This KIP is actually already implemented in 3.7 with PR
>>>>>>>>>>>>> https://github.com/apache/kafka/pull/14860.
>>>>>>>>>>>>> I have created this KIP to explain the motivation and how
>>>>> managing
>>>>>>>>>>>> Metadata
>>>>>>>>>>>>> Versions is expected to work.
>>>>>>>>>>>>> Comments are greatly appreciated as this process can
>>> always
>>>> be
>>>>>>>>>> improved.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> --Proven

Re: [DISCUSS] KIP-1014: Managing Unstable Metadata Versions in Apache Kafka

Reply via email to