Mick and I have been chatting back and forth a bit about 4.1 beta, ASF CI, circleci, and the current challenges we face as a project with our CI environment(s). Given the consistent difficulty we continue to have with the ASF CI env that's been blocking our beta + ga for months now, I think we should revisit some of the verbiage around our release lifecycle and commit guidelines to better align our release process with confidence that our code is correct rather than confidence that ASF CI is passing (not necessarily the same thing!)
The proposal: ---------------- Revise language on release lifecycle to specify (reference: https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle): - For beta: 1 Green run on circle or 1 green run on ci-cassandra - No new flaky tests on ci-cassandra compared to older branches as a snapshot at a point in time - For ga: 3 consecutive green runs of either circle or ci-cassandra - No new flaky tests on ci-cassandra compared to older branches as a snapshot at a point in time Revise committing guidelines to specify (reference: https://cassandra.apache.org/_/development/how_to_commit.html): - Must run either circleci or ci-cassandra before commit - For CI on a patch, run the pre-commit suite and also run multiplexer with 250 runs on new, changed, or related tests to ensure not flaky - Include instructions clearly on how to do that in the wiki / How to Commit page (generate.sh etc) - In case of ci-cassandra, ensure run doesn't have any new test failures compared to previous run (i.e. no new flakes attributable to your patch) <- only needed while we get ci-cassandra cleaned up In the next major release lifecycle, commit to working as a project on either tidying up our current heterogenous ASF CI environment or spin up a new homogenous well provisioned environment. --------------- I think it's worth calling out: relying on circleci test runs along with ASF CI as an option implies we have the same confidence in a green circleci run as we have in a green ASF CI run. The above revision effectively serves as a failsafe for us to be able to cut releases on a more predictable cadence while we continue to improve the ASF CI environment and eventually replace circle with just ASF CI for release validation. So - what do we think? On Mon, Sep 26, 2022, at 12:48 PM, Mick Semb Wever wrote: >> We discussed in JIRA + ML a balanced approach to cutting 4.1 GA, which was >> having 1 green run to cut beta and 3 green runs in a row to cut GA (correct >> me if I'm wrong here Mick). To that end, checking in on 4.1 Butler we see >> we're at 6 failures on run 162: >> https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1. >> This appears to be a pretty consistent number of failures and the history >> in butler shows that the failures are scattershot all over the test suites; >> we're probably going to have to continue to whack-a-mole these flaky >> failures down as long as we keep "Green ASF CI" as our gatekeeping metric as >> this is inherent in the current runtime environment. > > > +1 > > I would like to extend that^ to say for beta: CircleCI to be green, and no > known flaky regressions (or blocking jira tickets). > > I know there is more to come on this topic (Josh), and that it is important > we balance being pragmatic versus ensuring we continue to push towards green > ci-cassandra and no flakies. But for now this means we are ready to cut > 4.1-beta1 (which I shall do in the next few hours). >