Re: [DISCUSS] Releasable trunk and quality

Joshua McKenzie Sat, 04 Dec 2021 06:05:28 -0800

After some offline collab, here's where this thread has landed on a
proposal to change our processes to incrementally improve our processes and
hopefully stabilize the state of CI longer term:


Link:
https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4

Hopefully the mail server doesn't butcher formatting; if it does, hit up
the gdoc and leave comments there as should be open to all.

Phase 1:
Document merge criteria; update circle jobs to have a simple pre-merge job
(one for each JDK profile)
     * Donate, document, and formalize usage of circleci-enable.py in ASF
repo (need new commit scripts / dev tooling section?)
        * rewrites circle config jobs to simple clear flow
        * ability to toggle between "run on push" or "click to run"
        * Variety of other functionality; see below
Document (site, help, README.md) and automate via scripting the
relationship / dev / release process around:
    * In-jvm dtest
    * dtest
    * ccm
Integrate and document usage of script to build CI repeat test runs
    * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
    * Document “Do this if you add or change tests”
Introduce “Build Lead” role
    * Weekly rotation; volunteer
    * 1: Make sure JIRAs exist for test failures
    * 2: Attempt to triage new test failures to root cause and assign out
    * 3: Coordinate and drive to green board on trunk
Change and automate process for *trunk only* patches:
    * Block on green CI (from merge criteria in CI above; potentially
stricter definition of "clean" for trunk CI)
    * Consider using github PR’s to merge (TODO: determine how to handle
circle + CHANGES; see below)
Automate process for *multi-branch* merges
    * Harden / contribute / document dcapwell script (has one which does
the following):
        * rebases your branch to the latest (if on 3.0 then rebase against
cassandra-3.0)
        * check compiles
        * removes all changes to .circle (can opt-out for circleci patches)
        * removes all changes to CHANGES.txt and leverages JIRA for the
content
        * checks code still compiles
        * changes circle to run ci
        * push to a temp branch in git and run CI (circle + Jenkins)
            * when all branches are clean (waiting step is manual)
            * TODO: Define “clean”
                * No new test failures compared to reference?
                * Or no test failures at all?
            * merge changes into the actual branches
            * merge up changes; rewriting diff
            * push --atomic

Transition to phase 2 when:
    * All items from phase 1 are complete
    * Test boards for supported branches are green

Phase 2:
* Add Harry to recurring run against trunk
* Add Harry to release pipeline
* Suite of perf tests against trunk recurring



On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jmcken...@apache.org>
wrote:

> Sorry for not catching that Benedict, you're absolutely right. So long as
> we're using merge commits between branches I don't think auto-merging via
> train or blocking on green CI are options via the tooling, and multi-branch
> reverts will be something we should document very clearly should we even
> choose to go that route (a lot of room to make mistakes there).
>
> It may not be a huge issue as we can expect the more disruptive changes
> (i.e. potentially destabilizing) to be happening on trunk only, so perhaps
> we can get away with slightly different workflows or policies based on
> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
> thinking more deeply about.
>
> I'd also be game for revisiting our merge strategy. I don't see much
> difference in labor between merging between branches vs. preparing separate
> patches for an individual developer, however I'm sure there's maintenance
> and integration implications there I'm not thinking of right now.
>
> On Wed, Nov 17, 2021 at 12:03 PM bened...@apache.org <bened...@apache.org>
> wrote:
>
>> I raised this before, but to highlight it again: how do these approaches
>> interface with our merge strategy?
>>
>> We might have to rebase several dependent merge commits and want to merge
>> them atomically. So far as I know these tools don’t work fantastically in
>> this scenario, but if I’m wrong that’s fantastic. If not, given how
>> important these things are, should we consider revisiting our merge
>> strategy?
>>
>> From: Joshua McKenzie <jmcken...@apache.org>
>> Date: Wednesday, 17 November 2021 at 16:39
>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> Subject: Re: [DISCUSS] Releasable trunk and quality
>> Thanks for the feedback and insight Henrik; it's valuable to hear how
>> other
>> large complex infra projects have tackled this problem set.
>>
>> To attempt to summarize, what I got from your email:
>> [Phase one]
>> 1) Build Barons: rotation where there's always someone active tying
>> failures to changes and adding those failures to our ticketing system
>> 2) Best effort process of "test breakers" being assigned tickets to fix
>> the
>> things their work broke
>> 3) Moving to a culture where we regularly revert commits that break tests
>> 4) Running tests before we merge changes
>>
>> [Phase two]
>> 1) Suite of performance tests on a regular cadence against trunk (w/hunter
>> or otherwise)
>> 2) Integration w/ github merge-train pipelines
>>
>> That cover the highlights? I agree with these points as useful places for
>> us to invest in as a project and I'll work on getting this into a gdoc for
>> us to align on and discuss further this week.
>>
>> ~Josh
>>
>>
>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <henrik.i...@datastax.com>
>> wrote:
>>
>> > There's an old joke: How many people read Slashdot? The answer is 5. The
>> > rest of us just write comments without reading... In that spirit, I
>> wanted
>> > to share some thoughts in response to your question, even if I know
>> some of
>> > it will have been said in this thread already :-)
>> >
>> > Basically, I just want to share what has worked well in my past
>> projects...
>> >
>> > Visualization: Now that we have Butler running, we can already see a
>> > decline in failing tests for 4.0 and trunk! This shows that contributors
>> > want to do the right thing, we just need the right tools and processes
>> to
>> > achieve success.
>> >
>> > Process: I'm confident we will soon be back to seeing 0 failures for 4.0
>> > and trunk. However, keeping that state requires constant vigilance! At
>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...). This
>> is a
>> > weekly rotating role where the person who is the Build Baron will at
>> least
>> > once per day go through all of the Butler dashboards to catch new
>> > regressions early. We have used the same process also at Datastax to
>> guard
>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
>> Build
>> > Baron to
>> >  - file a jira ticket for new failures
>> >  - determine which commit is responsible for introducing the regression.
>> > Sometimes this is obvious, sometimes this requires "bisecting" by
>> running
>> > more builds e.g. between two nightly builds.
>> >  - assign the jira ticket to the author of the commit that introduced
>> the
>> > regression
>> >
>> > Given that Cassandra is a community that includes part time and
>> volunteer
>> > developers, we may want to try some variation of this, such as pairing 2
>> > build barons each week?
>> >
>> > Reverting: A policy that the commit causing the regression is
>> automatically
>> > reverted can be scary. It takes courage to be the junior test engineer
>> who
>> > reverts yesterday's commit from the founder and CTO, just to give an
>> > example... Yet this is the most efficient way to keep the build green.
>> And
>> > it turns out it's not that much additional work for the original author
>> to
>> > fix the issue and then re-merge the patch.
>> >
>> > Merge-train: For any project with more than 1 commit per day, it will
>> > inevitably happen that you need to rebase a PR before merging, and even
>> if
>> > it passed all tests before, after rebase it won't. In the downstream
>> > Cassandra fork previously mentioned, we have tried to enable a github
>> rule
>> > which requires a) that all tests passed before merging, and b) the PR is
>> > against the head of the branch merged into, and c) the tests were run
>> after
>> > such rebase. Unfortunately this leads to infinite loops where a large PR
>> > may never be able to commit because it has to be rebased again and again
>> > when smaller PRs can merge faster. The solution to this problem is to
>> have
>> > an automated process for the rebase-test-merge cycle. Gitlab supports
>> such
>> > a feature and calls it merge-trean:
>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>> >
>> > The merge-train can be considered an advanced feature and we can return
>> to
>> > it later. The other points should be sufficient to keep a reasonably
>> green
>> > trunk.
>> >
>> > I guess the major area where we can improve daily test coverage would be
>> > performance tests. To that end we recently open sourced a nice tool that
>> > can algorithmically detects performance regressions in a timeseries
>> history
>> > of benchmark results: https://github.com/datastax-labs/hunter Just like
>> > with correctness testing it's my experience that catching regressions
>> the
>> > day they happened is much better than trying to do it at beta or rc
>> time.
>> >
>> > Piotr also blogged about Hunter when it was released:
>> >
>> >
>> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>> >
>> > henrik
>> >
>> >
>> >
>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jmcken...@apache.org>
>> > wrote:
>> >
>> > > We as a project have gone back and forth on the topic of quality and
>> the
>> > > notion of a releasable trunk for quite a few years. If people are
>> > > interested, I'd like to rekindle this discussion a bit and see if
>> we're
>> > > happy with where we are as a project or if we think there's steps we
>> > should
>> > > take to change the quality bar going forward. The following questions
>> > have
>> > > been rattling around for me for awhile:
>> > >
>> > > 1. How do we define what "releasable trunk" means? All reviewed by M
>> > > committers? Passing N% of tests? Passing all tests plus some other
>> > metrics
>> > > (manual testing, raising the number of reviewers, test coverage,
>> usage in
>> > > dev or QA environments, etc)? Something else entirely?
>> > >
>> > > 2. With a definition settled upon in #1, what steps, if any, do we
>> need
>> > to
>> > > take to get from where we are to having *and keeping* that releasable
>> > > trunk? Anything to codify there?
>> > >
>> > > 3. What are the benefits of having a releasable trunk as defined here?
>> > What
>> > > are the costs? Is it worth pursuing? What are the alternatives (for
>> > > instance: a freeze before a release + stabilization focus by the
>> > community
>> > > i.e. 4.0 push or the tock in tick-tock)?
>> > >
>> > > Given the large volumes of work coming down the pike with CEP's, this
>> > seems
>> > > like a good time to at least check in on this topic as a community.
>> > >
>> > > Full disclosure: running face-first into 60+ failing tests on trunk
>> when
>> > > going through the commit process for denylisting this week brought
>> this
>> > > topic back up for me (reminds me of when I went to merge CDC back in
>> 3.6
>> > > and those test failures riled me up... I sense a pattern ;))
>> > >
>> > > Looking forward to hearing what people think.
>> > >
>> > > ~Josh
>> > >
>> >
>> >
>> > --
>> >
>> > Henrik Ingo
>> >
>> > +358 40 569 7354 <358405697354>
>> >
>> > [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
>> us on
>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>> YouTube.]
>> > <
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
>> > >
>> >   [image: Visit my LinkedIn profile.] <
>> https://www.linkedin.com/in/heingo/
>> > >
>> >
>>
>

Re: [DISCUSS] Releasable trunk and quality

Reply via email to