Hi Proven, I'd say that we should do 2 & 2. The idea is that for small features that can be done and stabilized within a short period of time (with one or very few commits) that's exactly what happens -- people interested in testing in-progress feature could take unstable code from a patch (or private branch / fork) with the expectation that that private code could create a state that will not be compatible with anything (or may be completely broken for that matter -- in the end of the day it's a functionality that may not be fully tested or even fully implemented); and once the feature is stable it goes to trunk it is fully committed there, if the bugs are found they'd get fixed "forward". The 2 & 2 option pretty much extends this to large features -- if a feature is above stable MV, then going above it is like getting some in-progress code for early testing with the expectation that something may not fully work or leave system in upgradable state; promoting a feature into a state MV would come with the expectation that the feature gets fully committed and any bugs will be fixed "forward".
-Artem On Thu, Jan 11, 2024 at 10:16 AM Proven Provenzano <pprovenz...@confluent.io.invalid> wrote: > We have two approaches here for how we update unstable metadata versions. > > 1. The update will only increase MVs of unstable features to a value > greater than the new stable feature. The idea is that a specific > unstable > MV may support some set of features and in the future that set is > always a > strict subset of the current set. The issue is that moving a feature to > make way for a stable feature with a higher MV will leave holes. > 2. We are free to reorder the MV for any unstable feature. This removes > the hole issue, but does make the unstable MVs more muddled. There isn't > the same binary state for a MV where a feature is available or there is > a > hole. > > > We also have two ends of the spectrum as to when we update the stable MV. > > 1. We update at release points which reduces the amount of churn of the > unstable MVs and makes a stronger correlation between accepted features > and > stable MVs for a release but means less testing on trunk as a stable MV. > 2. We update when the developers of a feature think it is done. This > leads to features being available for more testing in trunk but forces > the > next release to include it as stable. > > > I'd like more feedback from others on these two dimensions. > --Proven > > > > On Wed, Jan 10, 2024 at 12:16 PM Justine Olshan > <jols...@confluent.io.invalid> wrote: > > > Hmm it seems like Colin and Proven are disagreeing with whether we can > swap > > unstable metadata versions. > > > > > When we reorder, we are always allocating a new MV and we are never > > reusing an existing MV even if it was also unstable. > > > > > Given that this is true, there's no reason to have special rules about > > what we can and can't do with unstable MVs. We can do anything > > > > I don't have a strong preference either way, but I think we should agree > on > > one approach. > > The benefit of reordering and reusing is that we can release features > that > > are ready earlier and we have more flexibility. With the approach where > we > > always create a new MV, I am concerned with having many "empty" MVs. This > > would encourage waiting until the release before we decide an incomplete > > feature is not ready and moving its MV into the future. (The > > abandoning comment I made earlier -- that is consistent with Proven's > > approach) > > > > I think the only potential issue with reordering is that it could be a > bit > > confusing and *potentially *prone to errors. Note I say potentially > because > > I think it depends on folks' understanding with this new unstable > metadata > > version concept. I echo Federico's comments about making sure the risks > are > > highlighted. > > > > Thanks, > > > > Justine > > > > On Wed, Jan 10, 2024 at 1:16 AM Federico Valeri <fedeval...@gmail.com> > > wrote: > > > > > Hi folks, > > > > > > > If you use an unstable MV, you probably won't be able to upgrade your > > > software. Because whenever something changes, you'll probably get > > > serialization exceptions being thrown inside the controller. Fatal > ones. > > > > > > Thanks for this clarification. I think this concrete risk should be > > > highlighted in the KIP and in the "unstable.metadata.versions.enable" > > > documentation. > > > > > > In the test plan, should we also have one system test checking that > > > "features with a stable MV will never have that MV changed"? > > > > > > On Wed, Jan 10, 2024 at 8:16 AM Colin McCabe <cmcc...@apache.org> > wrote: > > > > > > > > On Tue, Jan 9, 2024, at 18:56, Proven Provenzano wrote: > > > > > Hi folks, > > > > > > > > > > Thank you for the questions. > > > > > > > > > > Let me clarify about reorder first. The reorder of unstable > metadata > > > > > versions should be infrequent. > > > > > > > > Why does it need to be infrequent? We should be able to reorder > > unstable > > > metadata versions as often as we like. There are no guarantees about > > > unstable MVs. > > > > > > > > > The time you reorder is when a feature that > > > > > requires a higher metadata version to enable becomes "production > > > ready" and > > > > > the features with unstable metadata versions less than the new > stable > > > one > > > > > are moved to metadata versions greater than the new stable feature. > > > When we > > > > > reorder, we are always allocating a new MV and we are never reusing > > an > > > > > existing MV even if it was also unstable. This way a developer > > > upgrading > > > > > their environment with a specific unstable MV might see existing > > > > > functionality stop working but they won't see new MV dependent > > > > > functionality magically appear. The feature set for a given > unstable > > MV > > > > > version can only decrease with reordering. > > > > > > > > If you use an unstable MV, you probably won't be able to upgrade your > > > software. Because whenever something changes, you'll probably get > > > serialization exceptions being thrown inside the controller. Fatal > ones. > > > > > > > > Given that this is true, there's no reason to have special rules > about > > > what we can and can't do with unstable MVs. We can do anything. > > > > > > > > > > > > > > How do we define "production ready" and when should we bump > > > > > LATEST_PRODUCTION? I would like to define it to be the point where > > the > > > > > feature is code complete with tests and the KIP for it is approved. > > > However > > > > > even with this definition if the feature later develops a major > issue > > > it > > > > > could still block future features until the issue is fixed which is > > > what we > > > > > are trying to avoid here. We could be much more formal about this > and > > > let > > > > > the release manager for a release define what is stable for a given > > > release > > > > > and then do the bump just after the branch is created on the > branch. > > > When > > > > > an RC candidate is accepted, the bump would be backported. I would > > > like to > > > > > hear other ideas here. > > > > > > > > > > > > > Yeah, it's an interesting question. Overall, I think developers > should > > > define when a feature is production ready. > > > > > > > > The question to ask is, "are you ready to take this feature to > > > production in your workplace?" I think most developers do have a sense > of > > > this. Obviously bugs and mistakes can happen, but I think this standard > > > would avoid most of the issues that we're trying to avoid by having > > > unstable MVs in the first place. > > > > > > > > ELR is a good example. Nobody would have said that it was production > > > ready in 3.7 ... hence it belonged (and still belongs) in an unstable > MV, > > > until that changes (hopefully soon :) ) > > > > > > > > best, > > > > Colin > > > > > > > > > --Proven > > > > > > > > > > On Tue, Jan 9, 2024 at 3:26 PM Colin McCabe <cmcc...@apache.org> > > > wrote: > > > > > > > > > >> Hi Justine, > > > > >> > > > > >> Yes, this is an important point to clarify. Proven can comment > more, > > > but > > > > >> my understanding is that we can do anything to unstable metadata > > > versions. > > > > >> Reorder them, delete them, change them in any other way. There are > > no > > > > >> stability guarantees. If the current text is unclear let's add > more > > > > >> examples of what we can do (which is anything) :) > > > > >> > > > > >> best, > > > > >> Colin > > > > >> > > > > >> > > > > >> On Mon, Jan 8, 2024, at 14:18, Justine Olshan wrote: > > > > >> > Hey Colin, > > > > >> > > > > > >> > I had some offline discussions with Proven previously and it > seems > > > like > > > > >> he > > > > >> > said something different so I'm glad I brought it up here. > > > > >> > > > > > >> > Let's clarify if we are ok with reordering unstable metadata > > > versions :) > > > > >> > > > > > >> > Justine > > > > >> > > > > > >> > On Mon, Jan 8, 2024 at 1:56 PM Colin McCabe <cmcc...@apache.org > > > > > wrote: > > > > >> > > > > > >> >> On Mon, Jan 8, 2024, at 13:19, Justine Olshan wrote: > > > > >> >> > Hey all, > > > > >> >> > > > > > >> >> > I was wondering how often we plan to update LATEST_PRODUCTION > > > metadata > > > > >> >> > version. Is this something we should do as soon as the > feature > > is > > > > >> >> complete > > > > >> >> > or something we do when we are releasing kafka. When is the > > time > > > we > > > > >> >> abandon > > > > >> >> > a MV so that other features can be unblocked? > > > > >> >> > > > > >> >> Hi Justine, > > > > >> >> > > > > >> >> Thanks for reviewing. > > > > >> >> > > > > >> >> The idea is that you should bump LATEST_PRODUCTION when you > want > > to > > > > >> take a > > > > >> >> feature to production. That could mean deploying it internally > > > > >> somewhere to > > > > >> >> production, or doing an Apache release that lets everyone > deploy > > > the > > > > >> thing > > > > >> >> to production. > > > > >> >> > > > > >> >> Not in production? No need to care about this. Make any changes > > you > > > > >> like. > > > > >> >> > > > > >> >> As a corollary, we should keep the LATEST_PRODUCTION version as > > > low as > > > > >> it > > > > >> >> can be. If you haven't tested the feature, don't freeze it in > > > stone yet. > > > > >> >> > > > > >> >> > > > > > >> >> > I am just considering a feature that may end up missing a > > > release. It > > > > >> >> seems > > > > >> >> > like maybe that MV would block future metadata versions until > > we > > > > >> decide > > > > >> >> the > > > > >> >> > feature won't make the cut. From that point, all "ready" > > features > > > > >> should > > > > >> >> be > > > > >> >> > able to be released. > > > > >> >> > > > > >> >> The intention is the opposite. A feature in an unstable > metadata > > > version > > > > >> >> doesn't block anything. You can always move a feature from one > > > unstable > > > > >> >> metadata version to another if the feature starts taking too > long > > > to > > > > >> finish. > > > > >> >> > > > > >> >> > I'm also wondering if the KIP should include some information > > > about > > > > >> how a > > > > >> >> > metadata should be abandoned. Maybe there is a specific > message > > > to > > > > >> write > > > > >> >> in > > > > >> >> > the file? So folks who were maybe waiting on that version > know > > > they > > > > >> can > > > > >> >> > release their feature? > > > > >> >> > > > > > >> >> > I am also assuming that we don't shift all the waiting > metadata > > > > >> versions > > > > >> >> > when we abandon a version, but it would be good to clarify > and > > > > >> include in > > > > >> >> > the KIP. > > > > >> >> > > > > >> >> I'm not sure what you mean by abandoning a version. We never > > > abandon a > > > > >> >> version once it's stable. > > > > >> >> > > > > >> >> Unstable versions can change. I wouldn't describe this as > > > "abandonment", > > > > >> >> just the MV changing prior to release. > > > > >> >> > > > > >> >> In a similar way, the contents of the 3.7 branch will change up > > > until > > > > >> >> 3.7.0 is released. Once it gets released, it's never > unreleased. > > > We just > > > > >> >> move on to 3.7.1. Same thing here. > > > > >> >> > > > > >> >> best, > > > > >> >> Colin > > > > >> >> > > > > >> >> > > > > > >> >> > Thanks, > > > > >> >> > > > > > >> >> > Justine > > > > >> >> > > > > > >> >> > On Mon, Jan 8, 2024 at 12:44 PM Colin McCabe < > > cmcc...@apache.org > > > > > > > > >> wrote: > > > > >> >> > > > > > >> >> >> Hi Proven, > > > > >> >> >> > > > > >> >> >> Thanks for the KIP. I think there is a need for this > > > capability, for > > > > >> >> those > > > > >> >> >> of us who deploy from trunk (or branches dervied from > trunk). > > > > >> >> >> > > > > >> >> >> With regard to "unstable.metadata.versions.enable": is this > > > going to > > > > >> be > > > > >> >> a > > > > >> >> >> documented configuration, or an internal one? I am guessing > we > > > want > > > > >> it > > > > >> >> to > > > > >> >> >> be documented, so that users can use it. If we do, we should > > > probably > > > > >> >> also > > > > >> >> >> very prominently warn that THIS WILL BREAK UPGRADES FOR YOUR > > > CLUSTER. > > > > >> >> That > > > > >> >> >> includes logging an ERROR message on startup, etc. > > > > >> >> >> > > > > >> >> >> It would be good to document if a release can go out that > > > contains > > > > >> >> "future > > > > >> >> >> MVs" that are unstable. Like can we make a 3.8 release that > > > contains > > > > >> >> >> IBP_4_0_IV0 in MetadataVersion.java, as an unstable future > MV? > > > > >> >> Personally I > > > > >> >> >> think the answer should be "yes," but with the usual > caveats. > > > When > > > > >> the > > > > >> >> >> actual 4.0 comes out, the unstable 4.0 MV that shipped in > 3.8 > > > > >> probably > > > > >> >> >> won't work, and you won't be able to upgrade. (It was > > unstable, > > > we > > > > >> told > > > > >> >> you > > > > >> >> >> not to use it.) > > > > >> >> >> > > > > >> >> >> best, > > > > >> >> >> Colin > > > > >> >> >> > > > > >> >> >> > > > > >> >> >> On Fri, Jan 5, 2024, at 07:32, Proven Provenzano wrote: > > > > >> >> >> > Hey folks, > > > > >> >> >> > > > > > >> >> >> > I am starting a discussion thread for managing unstable > > > metadata > > > > >> >> >> > versions > > > > >> >> >> > in Apache Kafka. > > > > >> >> >> > > > > > >> >> >> > > > > >> >> > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1014%3A+Managing+Unstable+Metadata+Versions+in+Apache+Kafka > > > > >> >> >> > > > > > >> >> >> > This KIP is actually already implemented in 3.7 with PR > > > > >> >> >> > https://github.com/apache/kafka/pull/14860. > > > > >> >> >> > I have created this KIP to explain the motivation and how > > > managing > > > > >> >> >> Metadata > > > > >> >> >> > Versions is expected to work. > > > > >> >> >> > Comments are greatly appreciated as this process can > always > > be > > > > >> >> improved. > > > > >> >> >> > > > > > >> >> >> > -- > > > > >> >> >> > --Proven > > > > >> >> >> > > > > >> >> > > > > >> > > > > > >