On Wed, 10 Aug 2022 at 17:54, Josh McKenzie <jmcken...@apache.org> wrote:
> “ We can start by putting the bar at a lower level and raise the level > over time when most of the flakies that we hit are above that level.” > My only concern is only who and how will track that. > > What's Butler's logic for flagging things flaky? Maybe a "flaky low" vs. > "flaky high" distinction based on failure frequency (or some much better > name I'm sure someone else will come up with) could make sense? > I'd be keen to see orders of magnitude, rather than arbitrary labels. Also per CI system (having the data for basic correlation between systems will be useful in other discussions and decisions). > Then we could focus our efforts on the ones that are flagged as failing at > whatever high water mark threshold we set. > Maybe obvious, but so long there's a way to bypass this when a flaky is identified as being a legit bug and/or in a critical component (even a 1:1M flakiness in certain components can be disastrous). Some other questions… - how to measure the flakiness - how to measure post-commit rates across both CI systems - where the flakiness labels(/orders-of-magnitude) should be recorded - how we label flakies as being legit/critical/blocking (currently you often have to read through the comments) Applying this manually to the remaining 4.1 blockers we have: - CASSANDRA-17461 CASTest. 1:40 failures on circle. looks to be able 1:2 on ci-cassandra - CASSANDRA-17618 InternodeEncryptionEnforcementTest. 1:167 circle. no flaky in ci-cassandra - CASSANDRA-17804 AutoSnapshotTtlTest. unknown flakiness in both ci. - CASSANDRA-17573 PaxosRepairTest. 1:20 circle. no flakies in ci-cassandra. - CASSANDRA-17658 KeyspaceMetricsTest. 1:20 circle. no flakies in ci-cassandra. In addition to these, Butler lists a number of flakies against 4.1, but these are not regressions in 4.1 hence are not blockers. The jira board is currently not blocking a 4.1-beta release on non-regression flakies. This means our releases are not blocked on overall flakies, regardless if there's more or less of them. How are we to place this with our recent stance of no releases unless green…? (loops back to my "less overall flakies than previous release /campground-cleaner" suggestion) Side note, Butler is reporting CASSANDRA-17348 as open (it's resolved as a duplicate).