As the list of flaky tests was filtered out I wanted to add some
information about the test that revealed real issues. First there was a
mistake: only 3 of the issues were revealed by flaky tests. The other one
was a user report.
>From the 3 remaining tickets only 2 were 4.0 bugs: CASSANDRA-16238
<https://issues.apache.org/jira/browse/CASSANDRA-16238> and CASSANDRA-16668
<https://issues.apache.org/jira/browse/CASSANDRA-16668>(which was a pretty
hard to hit bug).
I totally agree that we found some real issues but the cost is pretty high:
2 months of work for two 4.0 issues.

I had a look this morning at how many users reported bugs on the RC-2
release. Outside of the people deeply involved in this project there were
only 4 people reporting true issues and all of the issues were relatively
minors.

I totally understand that we want to deliver a high quality product. I just
believe that we have to draw the line at some point.
The popularity of Cassandra has been going down for years (
https://db-engines.com/en/ranking_trend/system/Cassandra). The project
might need that release more than any bug fix we can do.

Le mar. 15 juin 2021 à 07:00, Dinesh Joshi <djos...@icloud.com.invalid> a
écrit :

> Based on the release lifecycle[1], we should cut another RC until we don’t
> find any blocking issues.
>
> Dinesh
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=132320437
>
> >
> > On Jun 14, 2021, at 9:05 PM, Scott Andreas <sc...@paradoxica.net> wrote:
> >
> > A second RC is appropriate given the revert of CASSANDRA-15899
> necessitated by the discovery of CASSANDRA-16735: Adding columns via ALTER
> TABLE can generate corrupt sstables.
> >
> > Ekaterina and Benedict's statement regarding the true positive rate of
> flaky tests also shows the value of resolving these, and that it would be
> good to pay this down as far as we can reasonably do so without
> unnecessarily withholding the release.
> >
> > I do think it's possible that an RC2 build is a candidate for nomination
> as our GA release. I don't think the RC2 phase needs to be drawn-out, but
> believe it would build confidence for the project to have positive feedback
> from a release containing the fix for C-16735. If work paying down the
> remaining flaky tests surfaces a similar true positive rate, a third build
> might be warranted, and it would be to the benefit of our users - but I
> don't think we're far off.
> >
> > I hope others are working to deploy the beta/RC builds and integrate +
> deploy changes from trunk into the releases they're deploying, as heavy
> contributors doing so provides us the best opportunity to catch these
> issues before our users do.
> >
> > We're getting close.
> >
> > ________________________________________
> > From: bened...@apache.org <bened...@apache.org>
> > Sent: Monday, June 14, 2021 3:03 PM
> > To: dev@cassandra.apache.org
> > Subject: Re: Are we ready for 4.0.0 (GA) ?
> >
> > A rate of 4/30 is a rate of 13% true bugs, which worries me with respect
> to our promise of shipping a bug-free GA.  In past releases we have ensured
> no flaky tests, I think.
> >
> > That said, I’ve not had the time to contribute to the fixing of flaky
> tests, so I’ll leave the decision to those who have, or otherwise have a
> strong opinion.
> >
> >
> > From: Ekaterina Dimitrova <e.dimitr...@gmail.com>
> > Date: Monday, 14 June 2021 at 20:51
> > To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > Subject: Re: Are we ready for 4.0.0 (GA) ?
> > To give some context around the flaky tests, I pulled a quick report for
> the fixed ones during the past two months. It is attached for your
> reference.
> >
> > To summarize, in two months 30 tickets for flaky tests were closed and
> only 4 of them were Cassandra bugs(marked in red in the report), the rest
> of them were test fixes.
> >
> > I think Butler and running in a loop any new tests before adding them to
> our test suite will help a lot. Also, Mick did a lot of work to stabilize
> Jenkins. Timeouts and resource issues are less common than before, that is
> a win! Thank you Mick!
> >
> > Best regards,
> > Ekaterina
> >
> >
> > On Mon, 14 Jun 2021 at 13:08, Adam Holmberg <adam.holmb...@datastax.com
> <mailto:adam.holmb...@datastax.com>> wrote:
> > To the point of "long-term observability over flakies":
> >
> > I will mention here that we intend to deploy a tool called Butler that we
> > have developed and used internally for a while. It compliments Jenkins to
> > present different views of test results, allowing developers to better
> > ascertain those tests that are flaky vs failing vs new regressions. We
> > already have a server provisioned for public hosting. The application
> > requires a bit of work to generalize for this project. We've been putting
> > it on while focused on getting 4.0 over the line, but should be getting
> to
> > it soon after.
> >
> >> On Mon, Jun 14, 2021 at 11:33 AM Mick Semb Wever <m...@apache.org
> <mailto:m...@apache.org>> wrote:
> >>
> >> Are we ready to cut 4.0.0 (GA) once the following tickets land?
> >>
> >> CASSANDRA-16733 – Allow operators to disable 'ALTER ... DROP COMPACT
> >> STORAGE' statements"
> >> CASSANDRA-16669 – Password obfuscation for DCL audit log statements
> >> CASSANDRA-16735 – Adding columns via ALTER TABLE can generate corrupt
> >> sstables
> >>
> >>
> >> A bit more background.
> >>
> >> 1. On our 4.0 GA board there's a few other tickets, which have priority
> but
> >> are not blockers for a GA release.
> >>
> >>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=355&quickFilter=1661
> >>
> >> CASSANDRA-16715 – WEBSITE - June 2021 updates
> >> CASSANDRA-12519 – dtest failure in
> >> offline_tools_test.TestOfflineTools.sstableofflinerelevel_test
> >> CASSANDRA-16681 – org.apache.cassandra.utils.memory.LongBufferPoolTest -
> >> tests are flaky
> >> CASSANDRA-16689 – Flaky LeaveAndBootstrapTest
> >>
> >>
> >> 2. We also said we would get 5 green CI runs in a row. Progress on that
> >> front
> >> has been slow and risks delaying GA and our user base. It has had
> priority
> >> and there's been lots of momentum which is persisting: lots of flaky
> fixes
> >> committed; and the following are being discussed to keep pushing it in
> the
> >> right direction…
> >> - Long-term observability over flakies
> >> - Jenkins agent observability (infra stability)
> >>
> >> The past weeks has seen good progress on stability of ci-cassandra.a.o
> with
> >> the introduction of cpu docker limits imposed, and better monitoring of
> the
> >> agents so we can ensure we get the saturation and load we want.
> Dockerising
> >> the cqlshlib tests is also in progress.
> >>
> >> The alternative to a 4.0.0 GA release is a 4.0-rc2 release.
> >> Should the next release be: 4.0.0 (GA) or 4.0-rc2 ?
> >>
> >
> >
> > --
> > Adam Holmberg
> > e. adam.holmb...@datastax.com<mailto:adam.holmb...@datastax.com>
> > w. www.datastax.com<http://www.datastax.com>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>

Reply via email to