A few thoughts: 1. If we gate the classification behind each test failure being root-caused, we consistently need people who are dedicating their time to doing that or we end up with a backlog of unclassified CI failures (like we have now and have always had historically). 2. Newcomers to the project or folks who haven't worked in the CI space that want to pitch in to push a release across the line right now don't have any guidance as to how to classify things. 3. Unless that classifications is more rigorous, rules like "no non-flaky test failures" don't actually mean the same thing to everyone on the project if we don't have a shared definition of what we're considering "flaky". Also, one other data point in favor of a simple frequency heuristic is that if these tests are flaking on ci-cassandra but not flaking on circleci, that's more evidence that they're *likely* test environment + authoring failures rather than product failures.
On Thu, Aug 18, 2022, at 10:36 AM, Brandon Williams wrote: > > I think a simple metric for "is something flaky" is "does it only fail once > > in the butler history (of 15 or so builds)". > > Does that make it considered flaky? What if the one failure is a > timeout? I think each failing case has to have the failures > investigated in order to know. > > Kind Regards, > Brandon > > On Thu, Aug 18, 2022 at 9:31 AM Josh McKenzie <jmcken...@apache.org> wrote: > > > > So move to beta when: > > > > all non-flaky test *failures* (NOT tickets, see below) are resolved > > We get a green ci-cassandra run > > > > Move to rc when: > > > > Three consecutive green runs in ci-cassandra > > > > Release when: > > > > All rc tickets are closed > > Some time-based gate maybe? > > Three more consecutive green ci-cassandra runs? > > > > > > We don't have people volunteering for the build lead role so we don't > > consistently have tickets created for flaky or non-flaky test failures, > > thus we can't use that as a gatekeeper IMO as it's non-deterministic. Using > > "no non-flaky failures in butler (i.e. ci-cassandra + history analysis)" > > should shore that up. We also need a more rigorous designation for flaky > > vs. non-flaky in our tickets outside an informal practice of adding that to > > the Summary. > > > > I think a simple metric for "is something flaky" is "does it only fail once > > in the butler history (of 15 or so builds)". > > > > We can then filter out our kanban to reflect that as well (flaky tests to > > their own swimlane as they're "iffy" as RC blockers; it'd technically be a > > roll of the dice as to whether any flake on the 3 consecutive runs we need > > to get green to release... which I don't love ;) ). > > > > We did something similar last time, this would be the same exception to the > > rules, rules we continue to get closer to. > > > > If we did something similar last time and this is the same exception to the > > rules, I don't think we're getting closer to satisfying those rules are we? > > i.e. I think we should consider revising the rules formally to match the > > above metrics that are a little fuzzier and more tolerant to the current > > (and richly historical!) reality of our CI environment. > > > > Would save us a lot of back and forth on subsequent releases. :) > > > > ~Josh > > > > On Thu, Aug 18, 2022, at 1:24 AM, Berenguer Blasi wrote: > > > > +1 to Mick's points. > > > > Also notice in circle 4.1 green runs are the norm lately imo. Yes it's not > > the official CI but it helps build an overall picture of improvement > > towards green CI. On jenkins, if you check the latest 4.1 runs, <5-ish > > failures per run are starting to be common and those that don't are known > > failures being worked on (CAS i.e.), infra or flakies taking you back to > > the <5-ish failures. So overall, if I am not missing anything, the signal > > among the infra and flaky noise is pretty good. > > > > Regards > > > > On 17/8/22 22:50, Ekaterina Dimitrova wrote: > > > > +1, I second Mick on both points. > > > > On Wed, 17 Aug 2022 at 16:23, Mick Semb Wever <m...@apache.org> wrote: > > > > We're down from 13 tickets blocking 4.1 beta down to 7: > > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455. > > As mentioned above, we have some test failures w/out tickets so that 7 is > > probably closer realistically to the previous count. > > > > > > > > I suggest we move to beta when all non-flaky-test tickets are resolved and > > we get our first green ci-cassandra run. > > And I suggest we move to rc when we get three consecutive green runs. > > > > We did something similar last time, this would be the same exception to the > > rules, rules we continue to get closer to. > > > > An alternative is to replace "green" with "builds with only non-regression > > and infra-caused failures". > > > > > > > > - It's pretty expensive and painful to defer cleaning up CI to the end of > > the release cycle > > > > > > > > This^ > > > > >