Re: [DISCUSS] Releasable trunk and quality

Ekaterina Dimitrova Mon, 06 Dec 2021 15:01:33 -0800

I think the script discussion is on a different thread and attached
document which I am also about to address soon :-)


On Mon, 6 Dec 2021 at 17:59, [email protected] <[email protected]>
wrote:

> Is there a reason we discounted modifying the merge strategy?
>
> I’m just a little wary of relying on scripts for consistency of behaviour
> here. Environments differ, and it would be far preferable for consistency
> of behaviour to rely on shared infrastructure if possible. I would probably
> be against mandating these scripts, at least.
>
> From: Joshua McKenzie <[email protected]>
> Date: Monday, 6 December 2021 at 22:20
> To: [email protected] <[email protected]>
> Subject: Re: [DISCUSS] Releasable trunk and quality
> As I work through the scripting on this, I don't know if we've documented
> or clarified the following (don't see it here:
> https://cassandra.apache.org/_/development/testing.html):
>
> Pre-commit test suites:
> * Which JDK's?
> * When to include all python tests or do JVM only (if ever)?
> * When to run upgrade tests?
> * What to do if a test is also failing on the reference root (i.e. trunk,
> cassandra-4.0, etc)?
> * What to do if a test fails intermittently?
>
> I'll also update the above linked documentation once we hammer this out and
> try and bake it into the scripting flow as much as possible as well. Goal
> is to make it easy to do the right thing and hard to do the wrong thing,
> and to have these things written down rather than have it be tribal
> knowledge that varies a lot across the project.
>
> ~Josh
>
> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <[email protected]>
> wrote:
>
> > After some offline collab, here's where this thread has landed on a
> > proposal to change our processes to incrementally improve our processes
> and
> > hopefully stabilize the state of CI longer term:
> >
> > Link:
> >
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
> >
> > Hopefully the mail server doesn't butcher formatting; if it does, hit up
> > the gdoc and leave comments there as should be open to all.
> >
> > Phase 1:
> > Document merge criteria; update circle jobs to have a simple pre-merge
> job
> > (one for each JDK profile)
> >      * Donate, document, and formalize usage of circleci-enable.py in ASF
> > repo (need new commit scripts / dev tooling section?)
> >         * rewrites circle config jobs to simple clear flow
> >         * ability to toggle between "run on push" or "click to run"
> >         * Variety of other functionality; see below
> > Document (site, help, README.md) and automate via scripting the
> > relationship / dev / release process around:
> >     * In-jvm dtest
> >     * dtest
> >     * ccm
> > Integrate and document usage of script to build CI repeat test runs
> >     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
> >     * Document “Do this if you add or change tests”
> > Introduce “Build Lead” role
> >     * Weekly rotation; volunteer
> >     * 1: Make sure JIRAs exist for test failures
> >     * 2: Attempt to triage new test failures to root cause and assign out
> >     * 3: Coordinate and drive to green board on trunk
> > Change and automate process for *trunk only* patches:
> >     * Block on green CI (from merge criteria in CI above; potentially
> > stricter definition of "clean" for trunk CI)
> >     * Consider using github PR’s to merge (TODO: determine how to handle
> > circle + CHANGES; see below)
> > Automate process for *multi-branch* merges
> >     * Harden / contribute / document dcapwell script (has one which does
> > the following):
> >         * rebases your branch to the latest (if on 3.0 then rebase
> against
> > cassandra-3.0)
> >         * check compiles
> >         * removes all changes to .circle (can opt-out for circleci
> patches)
> >         * removes all changes to CHANGES.txt and leverages JIRA for the
> > content
> >         * checks code still compiles
> >         * changes circle to run ci
> >         * push to a temp branch in git and run CI (circle + Jenkins)
> >             * when all branches are clean (waiting step is manual)
> >             * TODO: Define “clean”
> >                 * No new test failures compared to reference?
> >                 * Or no test failures at all?
> >             * merge changes into the actual branches
> >             * merge up changes; rewriting diff
> >             * push --atomic
> >
> > Transition to phase 2 when:
> >     * All items from phase 1 are complete
> >     * Test boards for supported branches are green
> >
> > Phase 2:
> > * Add Harry to recurring run against trunk
> > * Add Harry to release pipeline
> > * Suite of perf tests against trunk recurring
> >
> >
> >
> > On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <[email protected]>
> > wrote:
> >
> >> Sorry for not catching that Benedict, you're absolutely right. So long
> as
> >> we're using merge commits between branches I don't think auto-merging
> via
> >> train or blocking on green CI are options via the tooling, and
> multi-branch
> >> reverts will be something we should document very clearly should we even
> >> choose to go that route (a lot of room to make mistakes there).
> >>
> >> It may not be a huge issue as we can expect the more disruptive changes
> >> (i.e. potentially destabilizing) to be happening on trunk only, so
> perhaps
> >> we can get away with slightly different workflows or policies based on
> >> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
> >> thinking more deeply about.
> >>
> >> I'd also be game for revisiting our merge strategy. I don't see much
> >> difference in labor between merging between branches vs. preparing
> separate
> >> patches for an individual developer, however I'm sure there's
> maintenance
> >> and integration implications there I'm not thinking of right now.
> >>
> >> On Wed, Nov 17, 2021 at 12:03 PM [email protected] <
> [email protected]>
> >> wrote:
> >>
> >>> I raised this before, but to highlight it again: how do these
> approaches
> >>> interface with our merge strategy?
> >>>
> >>> We might have to rebase several dependent merge commits and want to
> >>> merge them atomically. So far as I know these tools don’t work
> >>> fantastically in this scenario, but if I’m wrong that’s fantastic. If
> not,
> >>> given how important these things are, should we consider revisiting our
> >>> merge strategy?
> >>>
> >>> From: Joshua McKenzie <[email protected]>
> >>> Date: Wednesday, 17 November 2021 at 16:39
> >>> To: [email protected] <[email protected]>
> >>> Subject: Re: [DISCUSS] Releasable trunk and quality
> >>> Thanks for the feedback and insight Henrik; it's valuable to hear how
> >>> other
> >>> large complex infra projects have tackled this problem set.
> >>>
> >>> To attempt to summarize, what I got from your email:
> >>> [Phase one]
> >>> 1) Build Barons: rotation where there's always someone active tying
> >>> failures to changes and adding those failures to our ticketing system
> >>> 2) Best effort process of "test breakers" being assigned tickets to fix
> >>> the
> >>> things their work broke
> >>> 3) Moving to a culture where we regularly revert commits that break
> tests
> >>> 4) Running tests before we merge changes
> >>>
> >>> [Phase two]
> >>> 1) Suite of performance tests on a regular cadence against trunk
> >>> (w/hunter
> >>> or otherwise)
> >>> 2) Integration w/ github merge-train pipelines
> >>>
> >>> That cover the highlights? I agree with these points as useful places
> for
> >>> us to invest in as a project and I'll work on getting this into a gdoc
> >>> for
> >>> us to align on and discuss further this week.
> >>>
> >>> ~Josh
> >>>
> >>>
> >>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <[email protected]
> >
> >>> wrote:
> >>>
> >>> > There's an old joke: How many people read Slashdot? The answer is 5.
> >>> The
> >>> > rest of us just write comments without reading... In that spirit, I
> >>> wanted
> >>> > to share some thoughts in response to your question, even if I know
> >>> some of
> >>> > it will have been said in this thread already :-)
> >>> >
> >>> > Basically, I just want to share what has worked well in my past
> >>> projects...
> >>> >
> >>> > Visualization: Now that we have Butler running, we can already see a
> >>> > decline in failing tests for 4.0 and trunk! This shows that
> >>> contributors
> >>> > want to do the right thing, we just need the right tools and
> processes
> >>> to
> >>> > achieve success.
> >>> >
> >>> > Process: I'm confident we will soon be back to seeing 0 failures for
> >>> 4.0
> >>> > and trunk. However, keeping that state requires constant vigilance!
> At
> >>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...).
> This
> >>> is a
> >>> > weekly rotating role where the person who is the Build Baron will at
> >>> least
> >>> > once per day go through all of the Butler dashboards to catch new
> >>> > regressions early. We have used the same process also at Datastax to
> >>> guard
> >>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
> >>> Build
> >>> > Baron to
> >>> >  - file a jira ticket for new failures
> >>> >  - determine which commit is responsible for introducing the
> >>> regression.
> >>> > Sometimes this is obvious, sometimes this requires "bisecting" by
> >>> running
> >>> > more builds e.g. between two nightly builds.
> >>> >  - assign the jira ticket to the author of the commit that introduced
> >>> the
> >>> > regression
> >>> >
> >>> > Given that Cassandra is a community that includes part time and
> >>> volunteer
> >>> > developers, we may want to try some variation of this, such as
> pairing
> >>> 2
> >>> > build barons each week?
> >>> >
> >>> > Reverting: A policy that the commit causing the regression is
> >>> automatically
> >>> > reverted can be scary. It takes courage to be the junior test
> engineer
> >>> who
> >>> > reverts yesterday's commit from the founder and CTO, just to give an
> >>> > example... Yet this is the most efficient way to keep the build
> green.
> >>> And
> >>> > it turns out it's not that much additional work for the original
> >>> author to
> >>> > fix the issue and then re-merge the patch.
> >>> >
> >>> > Merge-train: For any project with more than 1 commit per day, it will
> >>> > inevitably happen that you need to rebase a PR before merging, and
> >>> even if
> >>> > it passed all tests before, after rebase it won't. In the downstream
> >>> > Cassandra fork previously mentioned, we have tried to enable a github
> >>> rule
> >>> > which requires a) that all tests passed before merging, and b) the PR
> >>> is
> >>> > against the head of the branch merged into, and c) the tests were run
> >>> after
> >>> > such rebase. Unfortunately this leads to infinite loops where a large
> >>> PR
> >>> > may never be able to commit because it has to be rebased again and
> >>> again
> >>> > when smaller PRs can merge faster. The solution to this problem is to
> >>> have
> >>> > an automated process for the rebase-test-merge cycle. Gitlab supports
> >>> such
> >>> > a feature and calls it merge-trean:
> >>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
> >>> >
> >>> > The merge-train can be considered an advanced feature and we can
> >>> return to
> >>> > it later. The other points should be sufficient to keep a reasonably
> >>> green
> >>> > trunk.
> >>> >
> >>> > I guess the major area where we can improve daily test coverage would
> >>> be
> >>> > performance tests. To that end we recently open sourced a nice tool
> >>> that
> >>> > can algorithmically detects performance regressions in a timeseries
> >>> history
> >>> > of benchmark results: https://github.com/datastax-labs/hunter Just
> >>> like
> >>> > with correctness testing it's my experience that catching regressions
> >>> the
> >>> > day they happened is much better than trying to do it at beta or rc
> >>> time.
> >>> >
> >>> > Piotr also blogged about Hunter when it was released:
> >>> >
> >>> >
> >>>
> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
> >>> >
> >>> > henrik
> >>> >
> >>> >
> >>> >
> >>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
> [email protected]>
> >>> > wrote:
> >>> >
> >>> > > We as a project have gone back and forth on the topic of quality
> and
> >>> the
> >>> > > notion of a releasable trunk for quite a few years. If people are
> >>> > > interested, I'd like to rekindle this discussion a bit and see if
> >>> we're
> >>> > > happy with where we are as a project or if we think there's steps
> we
> >>> > should
> >>> > > take to change the quality bar going forward. The following
> questions
> >>> > have
> >>> > > been rattling around for me for awhile:
> >>> > >
> >>> > > 1. How do we define what "releasable trunk" means? All reviewed by
> M
> >>> > > committers? Passing N% of tests? Passing all tests plus some other
> >>> > metrics
> >>> > > (manual testing, raising the number of reviewers, test coverage,
> >>> usage in
> >>> > > dev or QA environments, etc)? Something else entirely?
> >>> > >
> >>> > > 2. With a definition settled upon in #1, what steps, if any, do we
> >>> need
> >>> > to
> >>> > > take to get from where we are to having *and keeping* that
> releasable
> >>> > > trunk? Anything to codify there?
> >>> > >
> >>> > > 3. What are the benefits of having a releasable trunk as defined
> >>> here?
> >>> > What
> >>> > > are the costs? Is it worth pursuing? What are the alternatives (for
> >>> > > instance: a freeze before a release + stabilization focus by the
> >>> > community
> >>> > > i.e. 4.0 push or the tock in tick-tock)?
> >>> > >
> >>> > > Given the large volumes of work coming down the pike with CEP's,
> this
> >>> > seems
> >>> > > like a good time to at least check in on this topic as a community.
> >>> > >
> >>> > > Full disclosure: running face-first into 60+ failing tests on trunk
> >>> when
> >>> > > going through the commit process for denylisting this week brought
> >>> this
> >>> > > topic back up for me (reminds me of when I went to merge CDC back
> in
> >>> 3.6
> >>> > > and those test failures riled me up... I sense a pattern ;))
> >>> > >
> >>> > > Looking forward to hearing what people think.
> >>> > >
> >>> > > ~Josh
> >>> > >
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > Henrik Ingo
> >>> >
> >>> > +358 40 569 7354 <358405697354>
> >>> >
> >>> > [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
> >>> us on
> >>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> >>> YouTube.]
> >>> > <
> >>> >
> >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
> >>> > >
> >>> >   [image: Visit my LinkedIn profile.] <
> >>> https://www.linkedin.com/in/heingo/
> >>> > >
> >>> >
> >>>
> >>
>

Re: [DISCUSS] Releasable trunk and quality

Reply via email to