Josh - from a community / user perspective, I think what you just described is a win. I have no clue what it means in the context of the actual work that needs to be done, so I'll leave that aspect to others to comment on.
I could see a world where a 5.1 makes sense - but only if it was offering incremental improvements over 5.0. Something like JDK 21 support, rather than a bunch of new features, and certainly not TCM / Accord. Jon On Wed, Jan 29, 2025 at 1:32 PM Josh McKenzie <jmcken...@apache.org> wrote: > My opinion is that it would be valuable to take this discussion as a > forcing function to determine how we plan to handle releases broadly to > answer the "5.1 should be 6.0" question. Assuming we move away from ad hoc > per-release debate. If there's broad strong dissent (i.e. let's have 6.0 be > the next major and talk about this topic separately) I'm happy to open > another thread, but I didn't see clear consensus on this thread yet and was > trying to help drive to that. > > Depending on what “T-2” means for the online upgrade. > > For 6.0, that would mean last 2 majors (5.0, 4.1). I think we'd need to > make an exception for 4.0 during the change much like we made exceptions > for 3.0 and 3.x, meaning T-3 to respect the current paradigm of "any > adjacent major.minor to next major" during this transition. > > For 7.0, online upgrade would be supported for 6.0 and 5.0. > > If you mean only 4.1 and 5.0 would be online upgrade targets, I would > suggest we change that to T-3 so you encompass all “currently supported” > releases at the time the new branch is GAed. > > I think that's better actually, yeah. I was originally thinking T-2 from > the "what calendar time frame is reasonable" perspective, but saying "if > you're on a currently supported branch you can upgrade to a release that > comes out" makes clean intuitive sense. That'd mean: > > 6.0: 5.0, 4.1, 4.0 online upgrades supported. Drop support for 4.0. API > compatible guaranteed w/5.0. > 7.0: 6.0, 5.0, 4.1 online upgrades supported. Drop support for 4.1. API > compatible guaranteed w/6.0. > 8.0: 7.0, 6.0, 5.0 online upgrades supported. Drop support for 5.0. API > compatible guaranteed w/7.0. > > On Wed, Jan 29, 2025, at 12:15 PM, Jeremiah Jordan wrote: > > This got way off topic from 5.1 should be 6.0, so maybe there should be a > new DISCUSS thread with the correct title to have a discussion around > codifying our upgrade paths? > > FWIW this mostly agrees with my thoughts around upgrade support. > > T-2 online upgrade supported, T-1 API compatible, deprecate-then-remove is > a combination of 3 simple things that I think will improve this situation > greatly and hopefully put a nail in the coffin of the topic, improve > things, and let us move on to more interesting topics that we can then > re-litigate endlessly. ;) > > > Depending on what “T-2” means for the online upgrade. If you mean 4.0, > 4.1, and 5.0 are all online upgrade supported versions for trunk, then I > agree. If you mean only 4.1 and 5.0 would be online upgrade targets, I > would suggest we change that to T-3 so you encompass all “currently > supported” releases at the time the new branch is GAed. > > -Jeremiah > > On Jan 29, 2025 at 10:49:17 AM, Josh McKenzie <jmcken...@apache.org> > wrote: > > > To clarify, when I say unspoken it includes "not consciously considered > but shapes engagement patterns". I don't think there's people sitting > around deeply against either the status quo or my proposal who are holding > back for nefarious purposes or anything. > > And yeah - my goal is to try and put a little more energy into this to see > if we can surface pushback as I don't think it'd be appropriate to move to > a VOTE thread on a proposal with essentially nil engagement. My intuition > is that the properties of the status quo isn't actually what the polity > wants, whether or not what I'm proposing is an improvement on that status > quo. > > On Wed, Jan 29, 2025, at 11:15 AM, Benedict wrote: > > > I think you’re making the mistake of assuming a representative sample of > the community participates in these debates. Sensibly, a majority of the > community sits these out, and I think on this topic that’s actually the > rational response. > > That doesn’t stop folk voting for something else when the decision > actually matters, as it shouldn’t - the polity can’t bind itself after all. > > Which is only to say, I applaud your optimism but it’s probably wrong to > assume there’ll be pushback that reifies the community’s revealed > preferences. There’s no reason to assume there will be, and history shows > there usually isn’t. > > To be clear, I don’t think these are our “unspoken incentives” but our > collective preferences that simply can’t functionally be codified due to > the fact nobody is willing to actually argue this is a good thing. > Sometimes no individual likes what happens, but it’s what the polity > actually wants, collectively. That’s fine, let’s be at peace with it. > > On 29 Jan 2025, at 16:00, Josh McKenzie <jmcken...@apache.org> wrote: > > > I've let this topic sit in my head overnight and kind of chewed on it. > While I agree w/the "we're doing what matches our unspoken incentives" > angle Benedict, I think we can do better than that both for ourselves and > our users if we apply energy here and codify something. If people come out > with energy to push *against* that codification, that'll at least bring > the unspoken incentives to light to work through. > > I think it's important we release on a predictable cadence for our users. > We've fallen short (in some cases exceptionally) on this in the past, and > it also adds value for operators to plan out verification and adoption > cycles. It also helps users considering different databases to see a > predictable cadence and a healthy project. My current position is that 12 > months is a happy medium min-value, especially with a T-2 supported cycle > since that gives users between 12 months for high appetite fast adoption up > to 36 months for slow verification. I don't want to further pry open > Pandora's box, but I'd love to see us cut alphas from trunk quarterly as > well. > > I also think it's important that our release versioning is clear and > simple. Right now, *to my mind*, it is not. The current matrix of: > > - Any .MINOR to next MAJOR is supported > - Any .MAJOR to next MAJOR is supported > - A release will be supported for some variable amount of time based > on when we get around to new releases > - API breaks in MAJOR changes, except when we get excited about a > feature and want to .MAJOR to signal that in which case it may be > completely low-risk and easy adoption, or we change JDK's and need to > signal that, or any of another slew of caveats that require digging into > NEWS.txt to see what the hell we're up to > - And all of our CI pain that ensues from the above > > In my opinion the above is a mess. This isn't a particularly interesting > topic to me, and us re-litigating this on every release (even if you > discount me agitating about it; this isn't just me making noise I think), > is a giant waste of time and energy for a low value outcome. > > T-2 online upgrade supported, T-1 API compatible, deprecate-then-remove is > a combination of 3 simple things that I think will improve this situation > greatly and hopefully put a nail in the coffin of the topic, improve > things, and let us move on to more interesting topics that we can then > re-litigate endlessly. ;) > > So - is anyone actively *against* the above proposal? > > On Tue, Jan 28, 2025, at 11:34 AM, David Capwell wrote: > > I have not checked Jenkins, but we see this in another environment… > > For python upgrades have we actually audited the runtime to see that the > time spent is doing real work? Josh and I have spent a ton of time trying > to fix (and failing) an issue where the python driver blocks the test and > we wait 2 hours for that to timeout… this pattern is always after all tests > are run… what I see is python upgrades take around 30m of real work, then > 2h of idle blocking taking all resources… > > > Sent from my iPhone > > On Jan 28, 2025, at 8:16 AM, Benedict <bened...@apache.org> wrote: > > > > My opinion? Our revealed preferences don’t match whatever ideal is being > chased whenever we discuss a policy. > . > Ignoring the tick-tick debacle the community has done basically the same > thing every release, only with a drift towards stricter QA and > compatibility expectations with maturity. > > That is, we have always numbered using some combination of semver and how > exciting the release is, and backed all other decisions out of whatever was > reasonable once that decision was made. > > Which basically means a new major every 1 or 2 releases depending on how > big the new features are. Which is actually pretty intuitive really, but > isn’t a policy anyone dogmatic wants to argue for. > > On 28 Jan 2025, at 16:07, Josh McKenzie <jmcken...@apache.org> wrote: > > > > We revisit this basically every year and so I’m sort of inclined to keep > the status quo which really amounts to basically doing whatever we end up > deciding arbitrarily before we actually cut a release. > > Before discussing at length a new policy we’ll only immediately break > > It's painful how accurate this feels. =/ > > Is it the complexity of these topics that's keeping us stuck or a lack of > consensus... or both? > > if the motivation is > > My personal motivation is that our ad hoc re-litigating of this reactively > at the last possible moment over and over is uninteresting and feels like a > giant waste of time and energy for all of us. But to your point, if trying > to formalize it doesn't yield results, that's just objectively worse since > it's adding more churn on top of a churn-heavy process. /sigh > > On Tue, Jan 28, 2025, at 11:01 AM, Benedict wrote: > > > We revisit this basically every year and so I’m sort of inclined to keep > the status quo which really amounts to basically doing whatever we end up > deciding arbitrarily before we actually cut a release. > > Before discussing at length a new policy we’ll only immediately break, if > the motivation is avoiding extra release steps, I would prefer we just > avoid extra release steps by eg running nightly upgrade tests rather than > pre commit, or making the tests faster, or waiting until the test matrix > actually causes anything to break rather than assuming it will. > > On 28 Jan 2025, at 15:45, Josh McKenzie <jmcken...@apache.org> wrote: > > > > Python Upgrade DTests today requires 192x large (7 cpu, 14GB ram) servers > > > We have far fewer (and more effective?) JVM Upgrade DTests. > There we only need 8x medium (3 cpu, 5GB ram) servers > > > Does anyone have a strong understanding of the coverage and value offered > by the python upgrade dtests vs. the in-jvm dtests? I don't, but I > intuitively have a hard time believing the value difference matches the > hardware requirement difference there. > > Lots and lots of words about releases from mick (<3) > > Those of you who know me know my "spidey-senses" get triggered by enough > complexity regardless of how well justified. I feel like our release > process has passed this threshold for me. Been talking a lot with Mick > about this topic for a couple weeks and I'm curious if the community sees a > major flaw with a proposal like the following: > > - We formally support 3 releases at a time > - We only release MAJOR (i.e. semver major). No more "5.0, 5.1, 5.2", > would now be "5.0, 6.0, 7.0" > - We test and support online upgrades between supported releases > - Any removal or API breakage follows a "deprecate-then-release" cycle > - We cut a release every 12 months > > *Implications for operators:* > > - Upgrade paths for online upgrades are simple and clear. T-2. > - "Forced" update cadence to stay on supported versions is 3 years > - If you adopt v1.0 it will be supported until v4.0 comes out 36 > months later > - This gives users the flexibility to prioritize functionality vs. > stability and to balance release validation costs > - Deprecation cycles are clear as are compatibility paths. > - Release timelines and feature availability are predictable and clear > > *Implications for developers on the project:* > > - Support requirements for online upgrades are clear > - Opportunity cost of feature slippage relative to release date is > balanced (worst-case == 11.99 month delay on availability in GA supported > release) > - Path to keep code-base maintainable is clear (deprecate-then-remove) > - CI requirements are constrained and predictable > > Moving to a "online upgrades supported for everything" is something I > support in principle, but would advocate we consider after getting a handle > on our release process. > > So - what do we lose if we consider the above approach? > > On Tue, Jan 28, 2025, at 8:23 AM, Mick Semb Wever wrote: > > Jordan, replies inline. > > > To take a snippet from your email "A little empathy for our users goes a > long way." While I agree clarity is important, forcing our users to > upgrade multiple times is not in their best interest. > > > > Yes – we would be moving in that direction by now saying we aim for online > compatibility across all versions. But how feasible that turns out to be > depends on our future actions and new versions. > > The separation between "the code maintains compatibility across all > versions" versus "we only actively test these upgrade paths so that's our > limited recommendation" is here what lets us reduce the "forcing our users > to upgrade multiple times". That's the "other paths may work but you're on > your own – do your homework" aspect. This is a position that allows us to > progress into something better. > > For now, and using the current status quo of major/minor usage as the > implemented example: this would progress us to no longer needing major > versions (we would just test all upgrade paths for all current maintained > versions, CI resources permitting). > The community can change over time as well, it's worth thinking about an > approach that is adjustable to changing resources. (This includes efforts > required in documenting past, present, future, especially as changes are > made.) > > I emphasise, first I think we need to be focusing on maintaining > compatibility in the code (and how and when we are willing/needing to break > it). > > > > At the same time, doesn't less testing resources primarily translate to > longer test runs? > > > > Too much also saturates the testing cluster to a point where tests become > flaky and fail. ci-cassandra.a.o is already better at exposing flaky tests > than other systems. This is a practicality, and it's constantly being > improved, but only under volunteer time. Donating test hardware is > the simpler ask. > > > Upgrade tests don't need to be run on every commit. When I worked on Riak > we had very comprehensive upgrade testing (pretty much the full matrix of > versions) and we had a schedule we ran these tests on ahead of release. > > > > We are already struggling to stay on top of failures and flakies with > ~per-commit builds and butler.c.a.o > I'm not against the idea of schedule test runs, but it needs more input > and effort from people in that space for it to accommodate it. > > I am not fond of the idea of "tests ahead of release" – release managers > already do enough and are a scarce resource. Asking them to also be the > build butler and chase down bugs and people to fix them is not appropriate > IMO. I also think it's unwise without guarantee that the > contributor/committer that created the bug is available at release time. > Having just one post-commit pipeline has nice benefits in simplicity, as > long as it's feasible then slow is ok (as you say above). > > > > Could you share some more details on the resource issues and their impacts? > > > Python Upgrade DTests and JVM Upgrade DTests. > > Python Upgrade DTests today requires 192x large (7 cpu, 14GB ram) servers, > each taking up to one hour. > Currently we have too many upgrade paths (4.0, 4.1, 5.0, to trunk), and > are seeing builds abort because of timeouts (>1hr). Collected timing > numbers suggest we should double this number to 384, or simply remove > upgrade paths we test. > > > https://github.com/apache/cassandra/blob/trunk/.jenkins/Jenkinsfile#L185-L188 > > https://github.com/apache/cassandra/blob/trunk/.jenkins/Jenkinsfile#L37 > > We have far fewer (and more effective?) JVM Upgrade DTests. > There we only need 8x medium (3 cpu, 5GB ram) servers. > https://github.com/apache/cassandra/blob/trunk/.jenkins/Jenkinsfile#L177 > > > > > > > >