> I think a simple metric for "is something flaky" is "does it only fail once > in the butler history (of 15 or so builds)".
Does that make it considered flaky? What if the one failure is a timeout? I think each failing case has to have the failures investigated in order to know. Kind Regards, Brandon On Thu, Aug 18, 2022 at 9:31 AM Josh McKenzie <jmcken...@apache.org> wrote: > > So move to beta when: > > all non-flaky test *failures* (NOT tickets, see below) are resolved > We get a green ci-cassandra run > > Move to rc when: > > Three consecutive green runs in ci-cassandra > > Release when: > > All rc tickets are closed > Some time-based gate maybe? > Three more consecutive green ci-cassandra runs? > > > We don't have people volunteering for the build lead role so we don't > consistently have tickets created for flaky or non-flaky test failures, thus > we can't use that as a gatekeeper IMO as it's non-deterministic. Using "no > non-flaky failures in butler (i.e. ci-cassandra + history analysis)" should > shore that up. We also need a more rigorous designation for flaky vs. > non-flaky in our tickets outside an informal practice of adding that to the > Summary. > > I think a simple metric for "is something flaky" is "does it only fail once > in the butler history (of 15 or so builds)". > > We can then filter out our kanban to reflect that as well (flaky tests to > their own swimlane as they're "iffy" as RC blockers; it'd technically be a > roll of the dice as to whether any flake on the 3 consecutive runs we need to > get green to release... which I don't love ;) ). > > We did something similar last time, this would be the same exception to the > rules, rules we continue to get closer to. > > If we did something similar last time and this is the same exception to the > rules, I don't think we're getting closer to satisfying those rules are we? > i.e. I think we should consider revising the rules formally to match the > above metrics that are a little fuzzier and more tolerant to the current (and > richly historical!) reality of our CI environment. > > Would save us a lot of back and forth on subsequent releases. :) > > ~Josh > > On Thu, Aug 18, 2022, at 1:24 AM, Berenguer Blasi wrote: > > +1 to Mick's points. > > Also notice in circle 4.1 green runs are the norm lately imo. Yes it's not > the official CI but it helps build an overall picture of improvement towards > green CI. On jenkins, if you check the latest 4.1 runs, <5-ish failures per > run are starting to be common and those that don't are known failures being > worked on (CAS i.e.), infra or flakies taking you back to the <5-ish > failures. So overall, if I am not missing anything, the signal among the > infra and flaky noise is pretty good. > > Regards > > On 17/8/22 22:50, Ekaterina Dimitrova wrote: > > +1, I second Mick on both points. > > On Wed, 17 Aug 2022 at 16:23, Mick Semb Wever <m...@apache.org> wrote: > > We're down from 13 tickets blocking 4.1 beta down to 7: > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455. > As mentioned above, we have some test failures w/out tickets so that 7 is > probably closer realistically to the previous count. > > > > I suggest we move to beta when all non-flaky-test tickets are resolved and we > get our first green ci-cassandra run. > And I suggest we move to rc when we get three consecutive green runs. > > We did something similar last time, this would be the same exception to the > rules, rules we continue to get closer to. > > An alternative is to replace "green" with "builds with only non-regression > and infra-caused failures". > > > > - It's pretty expensive and painful to defer cleaning up CI to the end of the > release cycle > > > > This^ > >