As the list of flaky tests was filtered out I wanted to add some information about the test that revealed real issues. First there was a mistake: only 3 of the issues were revealed by flaky tests. The other one was a user report. >From the 3 remaining tickets only 2 were 4.0 bugs: CASSANDRA-16238 <https://issues.apache.org/jira/browse/CASSANDRA-16238> and CASSANDRA-16668 <https://issues.apache.org/jira/browse/CASSANDRA-16668>(which was a pretty hard to hit bug). I totally agree that we found some real issues but the cost is pretty high: 2 months of work for two 4.0 issues.
I had a look this morning at how many users reported bugs on the RC-2 release. Outside of the people deeply involved in this project there were only 4 people reporting true issues and all of the issues were relatively minors. I totally understand that we want to deliver a high quality product. I just believe that we have to draw the line at some point. The popularity of Cassandra has been going down for years ( https://db-engines.com/en/ranking_trend/system/Cassandra). The project might need that release more than any bug fix we can do. Le mar. 15 juin 2021 à 07:00, Dinesh Joshi <djos...@icloud.com.invalid> a écrit : > Based on the release lifecycle[1], we should cut another RC until we don’t > find any blocking issues. > > Dinesh > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=132320437 > > > > > On Jun 14, 2021, at 9:05 PM, Scott Andreas <sc...@paradoxica.net> wrote: > > > > A second RC is appropriate given the revert of CASSANDRA-15899 > necessitated by the discovery of CASSANDRA-16735: Adding columns via ALTER > TABLE can generate corrupt sstables. > > > > Ekaterina and Benedict's statement regarding the true positive rate of > flaky tests also shows the value of resolving these, and that it would be > good to pay this down as far as we can reasonably do so without > unnecessarily withholding the release. > > > > I do think it's possible that an RC2 build is a candidate for nomination > as our GA release. I don't think the RC2 phase needs to be drawn-out, but > believe it would build confidence for the project to have positive feedback > from a release containing the fix for C-16735. If work paying down the > remaining flaky tests surfaces a similar true positive rate, a third build > might be warranted, and it would be to the benefit of our users - but I > don't think we're far off. > > > > I hope others are working to deploy the beta/RC builds and integrate + > deploy changes from trunk into the releases they're deploying, as heavy > contributors doing so provides us the best opportunity to catch these > issues before our users do. > > > > We're getting close. > > > > ________________________________________ > > From: bened...@apache.org <bened...@apache.org> > > Sent: Monday, June 14, 2021 3:03 PM > > To: dev@cassandra.apache.org > > Subject: Re: Are we ready for 4.0.0 (GA) ? > > > > A rate of 4/30 is a rate of 13% true bugs, which worries me with respect > to our promise of shipping a bug-free GA. In past releases we have ensured > no flaky tests, I think. > > > > That said, I’ve not had the time to contribute to the fixing of flaky > tests, so I’ll leave the decision to those who have, or otherwise have a > strong opinion. > > > > > > From: Ekaterina Dimitrova <e.dimitr...@gmail.com> > > Date: Monday, 14 June 2021 at 20:51 > > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > Subject: Re: Are we ready for 4.0.0 (GA) ? > > To give some context around the flaky tests, I pulled a quick report for > the fixed ones during the past two months. It is attached for your > reference. > > > > To summarize, in two months 30 tickets for flaky tests were closed and > only 4 of them were Cassandra bugs(marked in red in the report), the rest > of them were test fixes. > > > > I think Butler and running in a loop any new tests before adding them to > our test suite will help a lot. Also, Mick did a lot of work to stabilize > Jenkins. Timeouts and resource issues are less common than before, that is > a win! Thank you Mick! > > > > Best regards, > > Ekaterina > > > > > > On Mon, 14 Jun 2021 at 13:08, Adam Holmberg <adam.holmb...@datastax.com > <mailto:adam.holmb...@datastax.com>> wrote: > > To the point of "long-term observability over flakies": > > > > I will mention here that we intend to deploy a tool called Butler that we > > have developed and used internally for a while. It compliments Jenkins to > > present different views of test results, allowing developers to better > > ascertain those tests that are flaky vs failing vs new regressions. We > > already have a server provisioned for public hosting. The application > > requires a bit of work to generalize for this project. We've been putting > > it on while focused on getting 4.0 over the line, but should be getting > to > > it soon after. > > > >> On Mon, Jun 14, 2021 at 11:33 AM Mick Semb Wever <m...@apache.org > <mailto:m...@apache.org>> wrote: > >> > >> Are we ready to cut 4.0.0 (GA) once the following tickets land? > >> > >> CASSANDRA-16733 – Allow operators to disable 'ALTER ... DROP COMPACT > >> STORAGE' statements" > >> CASSANDRA-16669 – Password obfuscation for DCL audit log statements > >> CASSANDRA-16735 – Adding columns via ALTER TABLE can generate corrupt > >> sstables > >> > >> > >> A bit more background. > >> > >> 1. On our 4.0 GA board there's a few other tickets, which have priority > but > >> are not blockers for a GA release. > >> > >> > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=355&quickFilter=1661 > >> > >> CASSANDRA-16715 – WEBSITE - June 2021 updates > >> CASSANDRA-12519 – dtest failure in > >> offline_tools_test.TestOfflineTools.sstableofflinerelevel_test > >> CASSANDRA-16681 – org.apache.cassandra.utils.memory.LongBufferPoolTest - > >> tests are flaky > >> CASSANDRA-16689 – Flaky LeaveAndBootstrapTest > >> > >> > >> 2. We also said we would get 5 green CI runs in a row. Progress on that > >> front > >> has been slow and risks delaying GA and our user base. It has had > priority > >> and there's been lots of momentum which is persisting: lots of flaky > fixes > >> committed; and the following are being discussed to keep pushing it in > the > >> right direction… > >> - Long-term observability over flakies > >> - Jenkins agent observability (infra stability) > >> > >> The past weeks has seen good progress on stability of ci-cassandra.a.o > with > >> the introduction of cpu docker limits imposed, and better monitoring of > the > >> agents so we can ensure we get the saturation and load we want. > Dockerising > >> the cqlshlib tests is also in progress. > >> > >> The alternative to a 4.0.0 GA release is a 4.0-rc2 release. > >> Should the next release be: 4.0.0 (GA) or 4.0-rc2 ? > >> > > > > > > -- > > Adam Holmberg > > e. adam.holmb...@datastax.com<mailto:adam.holmb...@datastax.com> > > w. www.datastax.com<http://www.datastax.com> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > >