After some offline collab, here's where this thread has landed on a proposal to change our processes to incrementally improve our processes and hopefully stabilize the state of CI longer term:
Link: https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4 Hopefully the mail server doesn't butcher formatting; if it does, hit up the gdoc and leave comments there as should be open to all. Phase 1: Document merge criteria; update circle jobs to have a simple pre-merge job (one for each JDK profile) * Donate, document, and formalize usage of circleci-enable.py in ASF repo (need new commit scripts / dev tooling section?) * rewrites circle config jobs to simple clear flow * ability to toggle between "run on push" or "click to run" * Variety of other functionality; see below Document (site, help, README.md) and automate via scripting the relationship / dev / release process around: * In-jvm dtest * dtest * ccm Integrate and document usage of script to build CI repeat test runs * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest * Document “Do this if you add or change tests” Introduce “Build Lead” role * Weekly rotation; volunteer * 1: Make sure JIRAs exist for test failures * 2: Attempt to triage new test failures to root cause and assign out * 3: Coordinate and drive to green board on trunk Change and automate process for *trunk only* patches: * Block on green CI (from merge criteria in CI above; potentially stricter definition of "clean" for trunk CI) * Consider using github PR’s to merge (TODO: determine how to handle circle + CHANGES; see below) Automate process for *multi-branch* merges * Harden / contribute / document dcapwell script (has one which does the following): * rebases your branch to the latest (if on 3.0 then rebase against cassandra-3.0) * check compiles * removes all changes to .circle (can opt-out for circleci patches) * removes all changes to CHANGES.txt and leverages JIRA for the content * checks code still compiles * changes circle to run ci * push to a temp branch in git and run CI (circle + Jenkins) * when all branches are clean (waiting step is manual) * TODO: Define “clean” * No new test failures compared to reference? * Or no test failures at all? * merge changes into the actual branches * merge up changes; rewriting diff * push --atomic Transition to phase 2 when: * All items from phase 1 are complete * Test boards for supported branches are green Phase 2: * Add Harry to recurring run against trunk * Add Harry to release pipeline * Suite of perf tests against trunk recurring On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jmcken...@apache.org> wrote: > Sorry for not catching that Benedict, you're absolutely right. So long as > we're using merge commits between branches I don't think auto-merging via > train or blocking on green CI are options via the tooling, and multi-branch > reverts will be something we should document very clearly should we even > choose to go that route (a lot of room to make mistakes there). > > It may not be a huge issue as we can expect the more disruptive changes > (i.e. potentially destabilizing) to be happening on trunk only, so perhaps > we can get away with slightly different workflows or policies based on > whether you're doing a multi-branch bugfix or a feature on trunk. Bears > thinking more deeply about. > > I'd also be game for revisiting our merge strategy. I don't see much > difference in labor between merging between branches vs. preparing separate > patches for an individual developer, however I'm sure there's maintenance > and integration implications there I'm not thinking of right now. > > On Wed, Nov 17, 2021 at 12:03 PM bened...@apache.org <bened...@apache.org> > wrote: > >> I raised this before, but to highlight it again: how do these approaches >> interface with our merge strategy? >> >> We might have to rebase several dependent merge commits and want to merge >> them atomically. So far as I know these tools don’t work fantastically in >> this scenario, but if I’m wrong that’s fantastic. If not, given how >> important these things are, should we consider revisiting our merge >> strategy? >> >> From: Joshua McKenzie <jmcken...@apache.org> >> Date: Wednesday, 17 November 2021 at 16:39 >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >> Subject: Re: [DISCUSS] Releasable trunk and quality >> Thanks for the feedback and insight Henrik; it's valuable to hear how >> other >> large complex infra projects have tackled this problem set. >> >> To attempt to summarize, what I got from your email: >> [Phase one] >> 1) Build Barons: rotation where there's always someone active tying >> failures to changes and adding those failures to our ticketing system >> 2) Best effort process of "test breakers" being assigned tickets to fix >> the >> things their work broke >> 3) Moving to a culture where we regularly revert commits that break tests >> 4) Running tests before we merge changes >> >> [Phase two] >> 1) Suite of performance tests on a regular cadence against trunk (w/hunter >> or otherwise) >> 2) Integration w/ github merge-train pipelines >> >> That cover the highlights? I agree with these points as useful places for >> us to invest in as a project and I'll work on getting this into a gdoc for >> us to align on and discuss further this week. >> >> ~Josh >> >> >> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <henrik.i...@datastax.com> >> wrote: >> >> > There's an old joke: How many people read Slashdot? The answer is 5. The >> > rest of us just write comments without reading... In that spirit, I >> wanted >> > to share some thoughts in response to your question, even if I know >> some of >> > it will have been said in this thread already :-) >> > >> > Basically, I just want to share what has worked well in my past >> projects... >> > >> > Visualization: Now that we have Butler running, we can already see a >> > decline in failing tests for 4.0 and trunk! This shows that contributors >> > want to do the right thing, we just need the right tools and processes >> to >> > achieve success. >> > >> > Process: I'm confident we will soon be back to seeing 0 failures for 4.0 >> > and trunk. However, keeping that state requires constant vigilance! At >> > Mongodb we had a role called Build Baron (aka Build Cop, etc...). This >> is a >> > weekly rotating role where the person who is the Build Baron will at >> least >> > once per day go through all of the Butler dashboards to catch new >> > regressions early. We have used the same process also at Datastax to >> guard >> > our downstream fork of Cassandra 4.0. It's the responsibility of the >> Build >> > Baron to >> > - file a jira ticket for new failures >> > - determine which commit is responsible for introducing the regression. >> > Sometimes this is obvious, sometimes this requires "bisecting" by >> running >> > more builds e.g. between two nightly builds. >> > - assign the jira ticket to the author of the commit that introduced >> the >> > regression >> > >> > Given that Cassandra is a community that includes part time and >> volunteer >> > developers, we may want to try some variation of this, such as pairing 2 >> > build barons each week? >> > >> > Reverting: A policy that the commit causing the regression is >> automatically >> > reverted can be scary. It takes courage to be the junior test engineer >> who >> > reverts yesterday's commit from the founder and CTO, just to give an >> > example... Yet this is the most efficient way to keep the build green. >> And >> > it turns out it's not that much additional work for the original author >> to >> > fix the issue and then re-merge the patch. >> > >> > Merge-train: For any project with more than 1 commit per day, it will >> > inevitably happen that you need to rebase a PR before merging, and even >> if >> > it passed all tests before, after rebase it won't. In the downstream >> > Cassandra fork previously mentioned, we have tried to enable a github >> rule >> > which requires a) that all tests passed before merging, and b) the PR is >> > against the head of the branch merged into, and c) the tests were run >> after >> > such rebase. Unfortunately this leads to infinite loops where a large PR >> > may never be able to commit because it has to be rebased again and again >> > when smaller PRs can merge faster. The solution to this problem is to >> have >> > an automated process for the rebase-test-merge cycle. Gitlab supports >> such >> > a feature and calls it merge-trean: >> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html >> > >> > The merge-train can be considered an advanced feature and we can return >> to >> > it later. The other points should be sufficient to keep a reasonably >> green >> > trunk. >> > >> > I guess the major area where we can improve daily test coverage would be >> > performance tests. To that end we recently open sourced a nice tool that >> > can algorithmically detects performance regressions in a timeseries >> history >> > of benchmark results: https://github.com/datastax-labs/hunter Just like >> > with correctness testing it's my experience that catching regressions >> the >> > day they happened is much better than trying to do it at beta or rc >> time. >> > >> > Piotr also blogged about Hunter when it was released: >> > >> > >> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4 >> > >> > henrik >> > >> > >> > >> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jmcken...@apache.org> >> > wrote: >> > >> > > We as a project have gone back and forth on the topic of quality and >> the >> > > notion of a releasable trunk for quite a few years. If people are >> > > interested, I'd like to rekindle this discussion a bit and see if >> we're >> > > happy with where we are as a project or if we think there's steps we >> > should >> > > take to change the quality bar going forward. The following questions >> > have >> > > been rattling around for me for awhile: >> > > >> > > 1. How do we define what "releasable trunk" means? All reviewed by M >> > > committers? Passing N% of tests? Passing all tests plus some other >> > metrics >> > > (manual testing, raising the number of reviewers, test coverage, >> usage in >> > > dev or QA environments, etc)? Something else entirely? >> > > >> > > 2. With a definition settled upon in #1, what steps, if any, do we >> need >> > to >> > > take to get from where we are to having *and keeping* that releasable >> > > trunk? Anything to codify there? >> > > >> > > 3. What are the benefits of having a releasable trunk as defined here? >> > What >> > > are the costs? Is it worth pursuing? What are the alternatives (for >> > > instance: a freeze before a release + stabilization focus by the >> > community >> > > i.e. 4.0 push or the tock in tick-tock)? >> > > >> > > Given the large volumes of work coming down the pike with CEP's, this >> > seems >> > > like a good time to at least check in on this topic as a community. >> > > >> > > Full disclosure: running face-first into 60+ failing tests on trunk >> when >> > > going through the commit process for denylisting this week brought >> this >> > > topic back up for me (reminds me of when I went to merge CDC back in >> 3.6 >> > > and those test failures riled me up... I sense a pattern ;)) >> > > >> > > Looking forward to hearing what people think. >> > > >> > > ~Josh >> > > >> > >> > >> > -- >> > >> > Henrik Ingo >> > >> > +358 40 569 7354 <358405697354> >> > >> > [image: Visit us online.] <https://www.datastax.com/> [image: Visit >> us on >> > Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on >> YouTube.] >> > < >> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e= >> > > >> > [image: Visit my LinkedIn profile.] < >> https://www.linkedin.com/in/heingo/ >> > > >> > >> >