Re: [DISCUSS] Releasable trunk and quality

Ekaterina Dimitrova Mon, 06 Dec 2021 15:21:05 -0800

Ok seems I was wrong and messed up the mails in my mailbox. Please ignore
my previous email


On Mon, 6 Dec 2021 at 18:01, Ekaterina Dimitrova <e.dimitr...@gmail.com>
wrote:

>
> I think the script discussion is on a different thread and attached
> document which I am also about to address soon :-)
>
> On Mon, 6 Dec 2021 at 17:59, bened...@apache.org <bened...@apache.org>
> wrote:
>
>> Is there a reason we discounted modifying the merge strategy?
>>
>> I’m just a little wary of relying on scripts for consistency of behaviour
>> here. Environments differ, and it would be far preferable for consistency
>> of behaviour to rely on shared infrastructure if possible. I would probably
>> be against mandating these scripts, at least.
>>
>> From: Joshua McKenzie <jmcken...@apache.org>
>> Date: Monday, 6 December 2021 at 22:20
>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> Subject: Re: [DISCUSS] Releasable trunk and quality
>> As I work through the scripting on this, I don't know if we've documented
>> or clarified the following (don't see it here:
>> https://cassandra.apache.org/_/development/testing.html):
>>
>> Pre-commit test suites:
>> * Which JDK's?
>> * When to include all python tests or do JVM only (if ever)?
>> * When to run upgrade tests?
>> * What to do if a test is also failing on the reference root (i.e. trunk,
>> cassandra-4.0, etc)?
>> * What to do if a test fails intermittently?
>>
>> I'll also update the above linked documentation once we hammer this out
>> and
>> try and bake it into the scripting flow as much as possible as well. Goal
>> is to make it easy to do the right thing and hard to do the wrong thing,
>> and to have these things written down rather than have it be tribal
>> knowledge that varies a lot across the project.
>>
>> ~Josh
>>
>> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jmcken...@apache.org>
>> wrote:
>>
>> > After some offline collab, here's where this thread has landed on a
>> > proposal to change our processes to incrementally improve our processes
>> and
>> > hopefully stabilize the state of CI longer term:
>> >
>> > Link:
>> >
>> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
>> >
>> > Hopefully the mail server doesn't butcher formatting; if it does, hit up
>> > the gdoc and leave comments there as should be open to all.
>> >
>> > Phase 1:
>> > Document merge criteria; update circle jobs to have a simple pre-merge
>> job
>> > (one for each JDK profile)
>> >      * Donate, document, and formalize usage of circleci-enable.py in
>> ASF
>> > repo (need new commit scripts / dev tooling section?)
>> >         * rewrites circle config jobs to simple clear flow
>> >         * ability to toggle between "run on push" or "click to run"
>> >         * Variety of other functionality; see below
>> > Document (site, help, README.md) and automate via scripting the
>> > relationship / dev / release process around:
>> >     * In-jvm dtest
>> >     * dtest
>> >     * ccm
>> > Integrate and document usage of script to build CI repeat test runs
>> >     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
>> >     * Document “Do this if you add or change tests”
>> > Introduce “Build Lead” role
>> >     * Weekly rotation; volunteer
>> >     * 1: Make sure JIRAs exist for test failures
>> >     * 2: Attempt to triage new test failures to root cause and assign
>> out
>> >     * 3: Coordinate and drive to green board on trunk
>> > Change and automate process for *trunk only* patches:
>> >     * Block on green CI (from merge criteria in CI above; potentially
>> > stricter definition of "clean" for trunk CI)
>> >     * Consider using github PR’s to merge (TODO: determine how to handle
>> > circle + CHANGES; see below)
>> > Automate process for *multi-branch* merges
>> >     * Harden / contribute / document dcapwell script (has one which does
>> > the following):
>> >         * rebases your branch to the latest (if on 3.0 then rebase
>> against
>> > cassandra-3.0)
>> >         * check compiles
>> >         * removes all changes to .circle (can opt-out for circleci
>> patches)
>> >         * removes all changes to CHANGES.txt and leverages JIRA for the
>> > content
>> >         * checks code still compiles
>> >         * changes circle to run ci
>> >         * push to a temp branch in git and run CI (circle + Jenkins)
>> >             * when all branches are clean (waiting step is manual)
>> >             * TODO: Define “clean”
>> >                 * No new test failures compared to reference?
>> >                 * Or no test failures at all?
>> >             * merge changes into the actual branches
>> >             * merge up changes; rewriting diff
>> >             * push --atomic
>> >
>> > Transition to phase 2 when:
>> >     * All items from phase 1 are complete
>> >     * Test boards for supported branches are green
>> >
>> > Phase 2:
>> > * Add Harry to recurring run against trunk
>> > * Add Harry to release pipeline
>> > * Suite of perf tests against trunk recurring
>> >
>> >
>> >
>> > On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jmcken...@apache.org>
>> > wrote:
>> >
>> >> Sorry for not catching that Benedict, you're absolutely right. So long
>> as
>> >> we're using merge commits between branches I don't think auto-merging
>> via
>> >> train or blocking on green CI are options via the tooling, and
>> multi-branch
>> >> reverts will be something we should document very clearly should we
>> even
>> >> choose to go that route (a lot of room to make mistakes there).
>> >>
>> >> It may not be a huge issue as we can expect the more disruptive changes
>> >> (i.e. potentially destabilizing) to be happening on trunk only, so
>> perhaps
>> >> we can get away with slightly different workflows or policies based on
>> >> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
>> >> thinking more deeply about.
>> >>
>> >> I'd also be game for revisiting our merge strategy. I don't see much
>> >> difference in labor between merging between branches vs. preparing
>> separate
>> >> patches for an individual developer, however I'm sure there's
>> maintenance
>> >> and integration implications there I'm not thinking of right now.
>> >>
>> >> On Wed, Nov 17, 2021 at 12:03 PM bened...@apache.org <
>> bened...@apache.org>
>> >> wrote:
>> >>
>> >>> I raised this before, but to highlight it again: how do these
>> approaches
>> >>> interface with our merge strategy?
>> >>>
>> >>> We might have to rebase several dependent merge commits and want to
>> >>> merge them atomically. So far as I know these tools don’t work
>> >>> fantastically in this scenario, but if I’m wrong that’s fantastic. If
>> not,
>> >>> given how important these things are, should we consider revisiting
>> our
>> >>> merge strategy?
>> >>>
>> >>> From: Joshua McKenzie <jmcken...@apache.org>
>> >>> Date: Wednesday, 17 November 2021 at 16:39
>> >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>> >>> Subject: Re: [DISCUSS] Releasable trunk and quality
>> >>> Thanks for the feedback and insight Henrik; it's valuable to hear how
>> >>> other
>> >>> large complex infra projects have tackled this problem set.
>> >>>
>> >>> To attempt to summarize, what I got from your email:
>> >>> [Phase one]
>> >>> 1) Build Barons: rotation where there's always someone active tying
>> >>> failures to changes and adding those failures to our ticketing system
>> >>> 2) Best effort process of "test breakers" being assigned tickets to
>> fix
>> >>> the
>> >>> things their work broke
>> >>> 3) Moving to a culture where we regularly revert commits that break
>> tests
>> >>> 4) Running tests before we merge changes
>> >>>
>> >>> [Phase two]
>> >>> 1) Suite of performance tests on a regular cadence against trunk
>> >>> (w/hunter
>> >>> or otherwise)
>> >>> 2) Integration w/ github merge-train pipelines
>> >>>
>> >>> That cover the highlights? I agree with these points as useful places
>> for
>> >>> us to invest in as a project and I'll work on getting this into a gdoc
>> >>> for
>> >>> us to align on and discuss further this week.
>> >>>
>> >>> ~Josh
>> >>>
>> >>>
>> >>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <
>> henrik.i...@datastax.com>
>> >>> wrote:
>> >>>
>> >>> > There's an old joke: How many people read Slashdot? The answer is 5.
>> >>> The
>> >>> > rest of us just write comments without reading... In that spirit, I
>> >>> wanted
>> >>> > to share some thoughts in response to your question, even if I know
>> >>> some of
>> >>> > it will have been said in this thread already :-)
>> >>> >
>> >>> > Basically, I just want to share what has worked well in my past
>> >>> projects...
>> >>> >
>> >>> > Visualization: Now that we have Butler running, we can already see a
>> >>> > decline in failing tests for 4.0 and trunk! This shows that
>> >>> contributors
>> >>> > want to do the right thing, we just need the right tools and
>> processes
>> >>> to
>> >>> > achieve success.
>> >>> >
>> >>> > Process: I'm confident we will soon be back to seeing 0 failures for
>> >>> 4.0
>> >>> > and trunk. However, keeping that state requires constant vigilance!
>> At
>> >>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...).
>> This
>> >>> is a
>> >>> > weekly rotating role where the person who is the Build Baron will at
>> >>> least
>> >>> > once per day go through all of the Butler dashboards to catch new
>> >>> > regressions early. We have used the same process also at Datastax to
>> >>> guard
>> >>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
>> >>> Build
>> >>> > Baron to
>> >>> >  - file a jira ticket for new failures
>> >>> >  - determine which commit is responsible for introducing the
>> >>> regression.
>> >>> > Sometimes this is obvious, sometimes this requires "bisecting" by
>> >>> running
>> >>> > more builds e.g. between two nightly builds.
>> >>> >  - assign the jira ticket to the author of the commit that
>> introduced
>> >>> the
>> >>> > regression
>> >>> >
>> >>> > Given that Cassandra is a community that includes part time and
>> >>> volunteer
>> >>> > developers, we may want to try some variation of this, such as
>> pairing
>> >>> 2
>> >>> > build barons each week?
>> >>> >
>> >>> > Reverting: A policy that the commit causing the regression is
>> >>> automatically
>> >>> > reverted can be scary. It takes courage to be the junior test
>> engineer
>> >>> who
>> >>> > reverts yesterday's commit from the founder and CTO, just to give an
>> >>> > example... Yet this is the most efficient way to keep the build
>> green.
>> >>> And
>> >>> > it turns out it's not that much additional work for the original
>> >>> author to
>> >>> > fix the issue and then re-merge the patch.
>> >>> >
>> >>> > Merge-train: For any project with more than 1 commit per day, it
>> will
>> >>> > inevitably happen that you need to rebase a PR before merging, and
>> >>> even if
>> >>> > it passed all tests before, after rebase it won't. In the downstream
>> >>> > Cassandra fork previously mentioned, we have tried to enable a
>> github
>> >>> rule
>> >>> > which requires a) that all tests passed before merging, and b) the
>> PR
>> >>> is
>> >>> > against the head of the branch merged into, and c) the tests were
>> run
>> >>> after
>> >>> > such rebase. Unfortunately this leads to infinite loops where a
>> large
>> >>> PR
>> >>> > may never be able to commit because it has to be rebased again and
>> >>> again
>> >>> > when smaller PRs can merge faster. The solution to this problem is
>> to
>> >>> have
>> >>> > an automated process for the rebase-test-merge cycle. Gitlab
>> supports
>> >>> such
>> >>> > a feature and calls it merge-trean:
>> >>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>> >>> >
>> >>> > The merge-train can be considered an advanced feature and we can
>> >>> return to
>> >>> > it later. The other points should be sufficient to keep a reasonably
>> >>> green
>> >>> > trunk.
>> >>> >
>> >>> > I guess the major area where we can improve daily test coverage
>> would
>> >>> be
>> >>> > performance tests. To that end we recently open sourced a nice tool
>> >>> that
>> >>> > can algorithmically detects performance regressions in a timeseries
>> >>> history
>> >>> > of benchmark results: https://github.com/datastax-labs/hunter Just
>> >>> like
>> >>> > with correctness testing it's my experience that catching
>> regressions
>> >>> the
>> >>> > day they happened is much better than trying to do it at beta or rc
>> >>> time.
>> >>> >
>> >>> > Piotr also blogged about Hunter when it was released:
>> >>> >
>> >>> >
>> >>>
>> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>> >>> >
>> >>> > henrik
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <
>> jmcken...@apache.org>
>> >>> > wrote:
>> >>> >
>> >>> > > We as a project have gone back and forth on the topic of quality
>> and
>> >>> the
>> >>> > > notion of a releasable trunk for quite a few years. If people are
>> >>> > > interested, I'd like to rekindle this discussion a bit and see if
>> >>> we're
>> >>> > > happy with where we are as a project or if we think there's steps
>> we
>> >>> > should
>> >>> > > take to change the quality bar going forward. The following
>> questions
>> >>> > have
>> >>> > > been rattling around for me for awhile:
>> >>> > >
>> >>> > > 1. How do we define what "releasable trunk" means? All reviewed
>> by M
>> >>> > > committers? Passing N% of tests? Passing all tests plus some other
>> >>> > metrics
>> >>> > > (manual testing, raising the number of reviewers, test coverage,
>> >>> usage in
>> >>> > > dev or QA environments, etc)? Something else entirely?
>> >>> > >
>> >>> > > 2. With a definition settled upon in #1, what steps, if any, do we
>> >>> need
>> >>> > to
>> >>> > > take to get from where we are to having *and keeping* that
>> releasable
>> >>> > > trunk? Anything to codify there?
>> >>> > >
>> >>> > > 3. What are the benefits of having a releasable trunk as defined
>> >>> here?
>> >>> > What
>> >>> > > are the costs? Is it worth pursuing? What are the alternatives
>> (for
>> >>> > > instance: a freeze before a release + stabilization focus by the
>> >>> > community
>> >>> > > i.e. 4.0 push or the tock in tick-tock)?
>> >>> > >
>> >>> > > Given the large volumes of work coming down the pike with CEP's,
>> this
>> >>> > seems
>> >>> > > like a good time to at least check in on this topic as a
>> community.
>> >>> > >
>> >>> > > Full disclosure: running face-first into 60+ failing tests on
>> trunk
>> >>> when
>> >>> > > going through the commit process for denylisting this week brought
>> >>> this
>> >>> > > topic back up for me (reminds me of when I went to merge CDC back
>> in
>> >>> 3.6
>> >>> > > and those test failures riled me up... I sense a pattern ;))
>> >>> > >
>> >>> > > Looking forward to hearing what people think.
>> >>> > >
>> >>> > > ~Josh
>> >>> > >
>> >>> >
>> >>> >
>> >>> > --
>> >>> >
>> >>> > Henrik Ingo
>> >>> >
>> >>> > +358 40 569 7354 <358405697354>
>> >>> >
>> >>> > [image: Visit us online.] <https://www.datastax.com/>  [image:
>> Visit
>> >>> us on
>> >>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>> >>> YouTube.]
>> >>> > <
>> >>> >
>> >>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
>> >>> > >
>> >>> >   [image: Visit my LinkedIn profile.] <
>> >>> https://www.linkedin.com/in/heingo/
>> >>> > >
>> >>> >
>> >>>
>> >>
>>
>

Re: [DISCUSS] Releasable trunk and quality

Reply via email to