Re: Cassandra project status update 2022-08-17

Josh McKenzie Thu, 18 Aug 2022 07:31:09 -0700

So move to beta when:
 1. all non-flaky test *failures* (NOT tickets, see below) are resolved
 2. We get a green ci-cassandra run
Move to rc when:
 1. Three consecutive green runs in ci-cassandra
Release when:
 1. All rc tickets are closed
 2. Some time-based gate maybe?
 3. Three more consecutive green ci-cassandra runs?


We don't have people volunteering for the build lead role so we don't 
consistently have tickets created for flaky or non-flaky test failures, thus we 
can't use that as a gatekeeper IMO as it's non-deterministic. Using "no 
non-flaky failures in butler (i.e. ci-cassandra + history analysis)" should 
shore that up. We also need a more rigorous designation for flaky vs. non-flaky 
in our tickets outside an informal practice of adding that to the Summary.

I think a simple metric for "is something flaky" is "does it only fail once in 
the butler history (of 15 or so builds)".

We can then filter out our kanban to reflect that as well (flaky tests to their 
own swimlane as they're "iffy" as RC blockers; it'd technically be a roll of 
the dice as to whether any flake on the 3 consecutive runs we need to get green 
to release... which I don't love ;) ).

> We did something similar last time, this would be the same exception to the 
> rules, rules we continue to get closer to.
If we did something similar last time and this is the same exception to the 
rules, I don't think we're getting closer to satisfying those rules are we? 
i.e. I think we should consider revising the rules formally to match the above 
metrics that are a little fuzzier and more tolerant to the current (and richly 
historical!) reality of our CI environment.

Would save us a lot of back and forth on subsequent releases. :)

~Josh

On Thu, Aug 18, 2022, at 1:24 AM, Berenguer Blasi wrote:
> +1 to Mick's points.
> 
> Also notice in circle 4.1 green runs are the norm lately imo. Yes it's not 
> the official CI but it helps build an overall picture of improvement towards 
> green CI. On jenkins, if you check the latest 4.1 runs, <5-ish failures per 
> run are starting to be common and those that don't are known failures being 
> worked on (CAS i.e.), infra or flakies taking you back to the <5-ish 
> failures. So overall, if I am not missing anything, the signal among the 
> infra and flaky noise is pretty good.
> 
> Regards
> 
> On 17/8/22 22:50, Ekaterina Dimitrova wrote:
>> +1, I second Mick on both points. 
>> 
>> On Wed, 17 Aug 2022 at 16:23, Mick Semb Wever <[email protected]> wrote:
>>>> We're down from 13 tickets blocking 4.1 beta down to 7: 
>>>> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455.
>>>>  As mentioned above, we have some test failures w/out tickets so that 7 is 
>>>> probably closer realistically to the previous count.
>>> 
>>> 
>>> I suggest we move to beta when all non-flaky-test tickets are resolved and 
>>> we get our first green ci-cassandra run. 
>>> And I suggest we move to rc when we get three consecutive green runs.
>>> 
>>> We did something similar last time, this would be the same exception to the 
>>> rules, rules we continue to get closer to.
>>> 
>>> An alternative is to replace "green" with "builds with only non-regression 
>>> and infra-caused failures".
>>> 
>>>  
>>>> - It's pretty expensive and painful to defer cleaning up CI to the end of 
>>>> the release cycle
>>> 
>>> 
>>> This^

Re: Cassandra project status update 2022-08-17

Reply via email to