Re: [DISCUSS] Releasable trunk and quality

Joshua McKenzie Tue, 02 Nov 2021 08:06:50 -0700

To your point Jacek, I believe in the run up to 4.0 Ekaterina did some
analysis and something like 18% (correct me if I'm wrong here) of the test
failures we were considering "flaky tests" were actual product defects in
the database. With that in mind, we should be uncomfortable cutting a
release if we have 6 test failures since there's every likelihood one of
them is a surfaced bug.


ensuring our best practices are followed for every merge

I totally agree but I also don't think we have this codified (unless I'm
just completely missing something - very possible! ;)) Seems like we have
different circle configs, different sets of jobs being run, Harry / Hunter
(maybe?) / ?? run on some but not all commits and/or all branches,
manual performance testing on specific releases but nothing surfaced
formally to the project as a reproducible suite like we used to have years
ago (primitive though it was at the time with what it covered).

If we *don't* have this clarified right now, I think there's significant
value in enumerating and at least documenting what our agreed upon best
practices are so we can start holding ourselves and each other accountable
to that bar. Given some of the incredible but sweeping work coming down the
pike, this strikes me as a thing we need to be proactive and vigilant about
so as not to regress.

~Josh

On Tue, Nov 2, 2021 at 3:49 AM Jacek Lewandowski <
lewandowski.ja...@gmail.com> wrote:

> >
> > we already have a way to confirm flakiness on circle by running the test
> > repeatedly N times. Like 100 or 500. That has proven to work very well
> > so far, at least for me. #collaborating #justfyi
> >
>
> It does not prove that it is the test flakiness. It still can be a bug in
> the code which occurs intermittently under some rare conditions
>
>
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
>
>
> On Tue, Nov 2, 2021 at 7:46 AM Berenguer Blasi <berenguerbl...@gmail.com>
> wrote:
>
> > Hi,
> >
> > we already have a way to confirm flakiness on circle by running the test
> > repeatedly N times. Like 100 or 500. That has proven to work very well
> > so far, at least for me. #collaborating #justfyi
> >
> > On the 60+ failures it is not as bad as it looks. Let me explain. I have
> > been tracking failures in 4.0 and trunk daily, it's grown as a habit in
> > me after the 4.0 push. And 4.0 and trunk were hovering around <10
> > failures solidly (you can check jenkins ci graphs). The random bisect or
> > fix was needed leaving behind 3 or 4 tests that have defeated already 2
> > or 3 committers, so the really tough guys. I am reasonably convinced
> > once the 60+ failures fix merges we'll be back to the <10 failures with
> > relative little effort.
> >
> > So we're just in the middle of a 'fix' but overall we shouldn't be as
> > bad as it looks now as we've been quite good at keeping CI green-ish imo.
> >
> > Also +1 to releasable branches, which whatever we settle it means it is
> > not a wall of failures, bc of reasons explained like the hidden costs etc
> >
> > My 2cts.
> >
> > On 2/11/21 6:07, Jacek Lewandowski wrote:
> > >> I don’t think means guaranteeing there are no failing tests (though
> > >> ideally this would also happen), but about ensuring our best practices
> > are
> > >> followed for every merge. 4.0 took so long to release because of the
> > amount
> > >> of hidden work that was created by merging work that didn’t meet the
> > >> standard for release.
> > >>
> > > Tests are sometimes considered flaky because they fail intermittently
> but
> > > it may not be related to the insufficiently consistent test
> > implementation
> > > and can reveal some real problem in the production code. I saw that in
> > > various codebases and I think that it would be great if each such test
> > (or
> > > test group) was guaranteed to have a ticket and some preliminary
> analysis
> > > was done to confirm it is just a test problem before releasing the new
> > > version
> > >
> > > Historically we have also had significant pressure to backport features
> > to
> > >> earlier versions due to the cost and risk of upgrading. If we maintain
> > >> broader version compatibility for upgrade, and reduce the risk of
> > adopting
> > >> newer versions, then this pressure is also reduced significantly.
> Though
> > >> perhaps we will stick to our guns here anyway, as there seems to be
> > renewed
> > >> pressure to limit work in GA releases to bug fixes exclusively. It
> > remains
> > >> to be seen if this holds.
> > >
> > > Are there any precise requirements for supported upgrade and downgrade
> > > paths?
> > >
> > > Thanks
> > > - - -- --- ----- -------- -------------
> > > Jacek Lewandowski
> > >
> > >
> > > On Sat, Oct 30, 2021 at 4:07 PM bened...@apache.org <
> bened...@apache.org
> > >
> > > wrote:
> > >
> > >>> How do we define what "releasable trunk" means?
> > >> For me, the major criteria is ensuring that work is not merged that is
> > >> known to require follow-up work, or could reasonably have been known
> to
> > >> require follow-up work if better QA practices had been followed.
> > >>
> > >> So, a big part of this is ensuring we continue to exceed our targets
> for
> > >> improved QA. For me this means trying to weave tools like Harry and
> the
> > >> Simulator into our development workflow early on, but we’ll see how
> well
> > >> these tools gain broader adoption. This also means focus in general on
> > >> possible negative effects of a change.
> > >>
> > >> I think we could do with producing guidance documentation for how to
> > >> approach QA, where we can record our best practices and evolve them as
> > we
> > >> discover flaws or pitfalls, either for ergonomics or for bug
> discovery.
> > >>
> > >>> What are the benefits of having a releasable trunk as defined here?
> > >> If we want to have any hope of meeting reasonable release cadences
> _and_
> > >> the high project quality we expect today, then I think a ~shippable
> > trunk
> > >> policy is an absolute necessity.
> > >>
> > >> I don’t think means guaranteeing there are no failing tests (though
> > >> ideally this would also happen), but about ensuring our best practices
> > are
> > >> followed for every merge. 4.0 took so long to release because of the
> > amount
> > >> of hidden work that was created by merging work that didn’t meet the
> > >> standard for release.
> > >>
> > >> Historically we have also had significant pressure to backport
> features
> > to
> > >> earlier versions due to the cost and risk of upgrading. If we maintain
> > >> broader version compatibility for upgrade, and reduce the risk of
> > adopting
> > >> newer versions, then this pressure is also reduced significantly.
> Though
> > >> perhaps we will stick to our guns here anyway, as there seems to be
> > renewed
> > >> pressure to limit work in GA releases to bug fixes exclusively. It
> > remains
> > >> to be seen if this holds.
> > >>
> > >>> What are the costs?
> > >> I think the costs are quite low, perhaps even negative. Hidden work
> > >> produced by merges that break things can be much more costly than
> > getting
> > >> the work right first time, as attribution is much more challenging.
> > >>
> > >> One cost that is created, however, is for version compatibility as we
> > >> cannot say “well, this is a minor version bump so we don’t need to
> > support
> > >> downgrade”. But I think we should be investing in this anyway for
> > operator
> > >> simplicity and confidence, so I actually see this as a benefit as
> well.
> > >>
> > >>> Full disclosure: running face-first into 60+ failing tests on trunk
> > >> I have to apologise here. CircleCI did not uncover these problems,
> > >> apparently due to some way it resolves dependencies, and so I am
> > >> responsible for a significant number of these and have been quite sick
> > >> since.
> > >>
> > >> I think a push to eliminate flaky tests will probably help here in
> > future,
> > >> though, and perhaps the project needs to have some (low) threshold of
> > flaky
> > >> or failing tests at which point we block merges to force a correction.
> > >>
> > >>
> > >> From: Joshua McKenzie <jmcken...@apache.org>
> > >> Date: Saturday, 30 October 2021 at 14:00
> > >> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> > >> Subject: [DISCUSS] Releasable trunk and quality
> > >> We as a project have gone back and forth on the topic of quality and
> the
> > >> notion of a releasable trunk for quite a few years. If people are
> > >> interested, I'd like to rekindle this discussion a bit and see if
> we're
> > >> happy with where we are as a project or if we think there's steps we
> > should
> > >> take to change the quality bar going forward. The following questions
> > have
> > >> been rattling around for me for awhile:
> > >>
> > >> 1. How do we define what "releasable trunk" means? All reviewed by M
> > >> committers? Passing N% of tests? Passing all tests plus some other
> > metrics
> > >> (manual testing, raising the number of reviewers, test coverage, usage
> > in
> > >> dev or QA environments, etc)? Something else entirely?
> > >>
> > >> 2. With a definition settled upon in #1, what steps, if any, do we
> need
> > to
> > >> take to get from where we are to having *and keeping* that releasable
> > >> trunk? Anything to codify there?
> > >>
> > >> 3. What are the benefits of having a releasable trunk as defined here?
> > What
> > >> are the costs? Is it worth pursuing? What are the alternatives (for
> > >> instance: a freeze before a release + stabilization focus by the
> > community
> > >> i.e. 4.0 push or the tock in tick-tock)?
> > >>
> > >> Given the large volumes of work coming down the pike with CEP's, this
> > seems
> > >> like a good time to at least check in on this topic as a community.
> > >>
> > >> Full disclosure: running face-first into 60+ failing tests on trunk
> when
> > >> going through the commit process for denylisting this week brought
> this
> > >> topic back up for me (reminds me of when I went to merge CDC back in
> 3.6
> > >> and those test failures riled me up... I sense a pattern ;))
> > >>
> > >> Looking forward to hearing what people think.
> > >>
> > >> ~Josh
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Releasable trunk and quality

Reply via email to