So move to beta when: 1. all non-flaky test *failures* (NOT tickets, see below) are resolved 2. We get a green ci-cassandra run Move to rc when: 1. Three consecutive green runs in ci-cassandra Release when: 1. All rc tickets are closed 2. Some time-based gate maybe? 3. Three more consecutive green ci-cassandra runs?
We don't have people volunteering for the build lead role so we don't consistently have tickets created for flaky or non-flaky test failures, thus we can't use that as a gatekeeper IMO as it's non-deterministic. Using "no non-flaky failures in butler (i.e. ci-cassandra + history analysis)" should shore that up. We also need a more rigorous designation for flaky vs. non-flaky in our tickets outside an informal practice of adding that to the Summary. I think a simple metric for "is something flaky" is "does it only fail once in the butler history (of 15 or so builds)". We can then filter out our kanban to reflect that as well (flaky tests to their own swimlane as they're "iffy" as RC blockers; it'd technically be a roll of the dice as to whether any flake on the 3 consecutive runs we need to get green to release... which I don't love ;) ). > We did something similar last time, this would be the same exception to the > rules, rules we continue to get closer to. If we did something similar last time and this is the same exception to the rules, I don't think we're getting closer to satisfying those rules are we? i.e. I think we should consider revising the rules formally to match the above metrics that are a little fuzzier and more tolerant to the current (and richly historical!) reality of our CI environment. Would save us a lot of back and forth on subsequent releases. :) ~Josh On Thu, Aug 18, 2022, at 1:24 AM, Berenguer Blasi wrote: > +1 to Mick's points. > > Also notice in circle 4.1 green runs are the norm lately imo. Yes it's not > the official CI but it helps build an overall picture of improvement towards > green CI. On jenkins, if you check the latest 4.1 runs, <5-ish failures per > run are starting to be common and those that don't are known failures being > worked on (CAS i.e.), infra or flakies taking you back to the <5-ish > failures. So overall, if I am not missing anything, the signal among the > infra and flaky noise is pretty good. > > Regards > > On 17/8/22 22:50, Ekaterina Dimitrova wrote: >> +1, I second Mick on both points. >> >> On Wed, 17 Aug 2022 at 16:23, Mick Semb Wever <m...@apache.org> wrote: >>>> We're down from 13 tickets blocking 4.1 beta down to 7: >>>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455. >>>> As mentioned above, we have some test failures w/out tickets so that 7 is >>>> probably closer realistically to the previous count. >>> >>> >>> I suggest we move to beta when all non-flaky-test tickets are resolved and >>> we get our first green ci-cassandra run. >>> And I suggest we move to rc when we get three consecutive green runs. >>> >>> We did something similar last time, this would be the same exception to the >>> rules, rules we continue to get closer to. >>> >>> An alternative is to replace "green" with "builds with only non-regression >>> and infra-caused failures". >>> >>> >>>> - It's pretty expensive and painful to defer cleaning up CI to the end of >>>> the release cycle >>> >>> >>> This^