Ok seems I was wrong and messed up the mails in my mailbox. Please ignore my previous email
On Mon, 6 Dec 2021 at 18:01, Ekaterina Dimitrova <e.dimitr...@gmail.com> wrote: > > I think the script discussion is on a different thread and attached > document which I am also about to address soon :-) > > On Mon, 6 Dec 2021 at 17:59, bened...@apache.org <bened...@apache.org> > wrote: > >> Is there a reason we discounted modifying the merge strategy? >> >> I’m just a little wary of relying on scripts for consistency of behaviour >> here. Environments differ, and it would be far preferable for consistency >> of behaviour to rely on shared infrastructure if possible. I would probably >> be against mandating these scripts, at least. >> >> From: Joshua McKenzie <jmcken...@apache.org> >> Date: Monday, 6 December 2021 at 22:20 >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >> Subject: Re: [DISCUSS] Releasable trunk and quality >> As I work through the scripting on this, I don't know if we've documented >> or clarified the following (don't see it here: >> https://cassandra.apache.org/_/development/testing.html): >> >> Pre-commit test suites: >> * Which JDK's? >> * When to include all python tests or do JVM only (if ever)? >> * When to run upgrade tests? >> * What to do if a test is also failing on the reference root (i.e. trunk, >> cassandra-4.0, etc)? >> * What to do if a test fails intermittently? >> >> I'll also update the above linked documentation once we hammer this out >> and >> try and bake it into the scripting flow as much as possible as well. Goal >> is to make it easy to do the right thing and hard to do the wrong thing, >> and to have these things written down rather than have it be tribal >> knowledge that varies a lot across the project. >> >> ~Josh >> >> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jmcken...@apache.org> >> wrote: >> >> > After some offline collab, here's where this thread has landed on a >> > proposal to change our processes to incrementally improve our processes >> and >> > hopefully stabilize the state of CI longer term: >> > >> > Link: >> > >> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4 >> > >> > Hopefully the mail server doesn't butcher formatting; if it does, hit up >> > the gdoc and leave comments there as should be open to all. >> > >> > Phase 1: >> > Document merge criteria; update circle jobs to have a simple pre-merge >> job >> > (one for each JDK profile) >> > * Donate, document, and formalize usage of circleci-enable.py in >> ASF >> > repo (need new commit scripts / dev tooling section?) >> > * rewrites circle config jobs to simple clear flow >> > * ability to toggle between "run on push" or "click to run" >> > * Variety of other functionality; see below >> > Document (site, help, README.md) and automate via scripting the >> > relationship / dev / release process around: >> > * In-jvm dtest >> > * dtest >> > * ccm >> > Integrate and document usage of script to build CI repeat test runs >> > * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest >> > * Document “Do this if you add or change tests” >> > Introduce “Build Lead” role >> > * Weekly rotation; volunteer >> > * 1: Make sure JIRAs exist for test failures >> > * 2: Attempt to triage new test failures to root cause and assign >> out >> > * 3: Coordinate and drive to green board on trunk >> > Change and automate process for *trunk only* patches: >> > * Block on green CI (from merge criteria in CI above; potentially >> > stricter definition of "clean" for trunk CI) >> > * Consider using github PR’s to merge (TODO: determine how to handle >> > circle + CHANGES; see below) >> > Automate process for *multi-branch* merges >> > * Harden / contribute / document dcapwell script (has one which does >> > the following): >> > * rebases your branch to the latest (if on 3.0 then rebase >> against >> > cassandra-3.0) >> > * check compiles >> > * removes all changes to .circle (can opt-out for circleci >> patches) >> > * removes all changes to CHANGES.txt and leverages JIRA for the >> > content >> > * checks code still compiles >> > * changes circle to run ci >> > * push to a temp branch in git and run CI (circle + Jenkins) >> > * when all branches are clean (waiting step is manual) >> > * TODO: Define “clean” >> > * No new test failures compared to reference? >> > * Or no test failures at all? >> > * merge changes into the actual branches >> > * merge up changes; rewriting diff >> > * push --atomic >> > >> > Transition to phase 2 when: >> > * All items from phase 1 are complete >> > * Test boards for supported branches are green >> > >> > Phase 2: >> > * Add Harry to recurring run against trunk >> > * Add Harry to release pipeline >> > * Suite of perf tests against trunk recurring >> > >> > >> > >> > On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jmcken...@apache.org> >> > wrote: >> > >> >> Sorry for not catching that Benedict, you're absolutely right. So long >> as >> >> we're using merge commits between branches I don't think auto-merging >> via >> >> train or blocking on green CI are options via the tooling, and >> multi-branch >> >> reverts will be something we should document very clearly should we >> even >> >> choose to go that route (a lot of room to make mistakes there). >> >> >> >> It may not be a huge issue as we can expect the more disruptive changes >> >> (i.e. potentially destabilizing) to be happening on trunk only, so >> perhaps >> >> we can get away with slightly different workflows or policies based on >> >> whether you're doing a multi-branch bugfix or a feature on trunk. Bears >> >> thinking more deeply about. >> >> >> >> I'd also be game for revisiting our merge strategy. I don't see much >> >> difference in labor between merging between branches vs. preparing >> separate >> >> patches for an individual developer, however I'm sure there's >> maintenance >> >> and integration implications there I'm not thinking of right now. >> >> >> >> On Wed, Nov 17, 2021 at 12:03 PM bened...@apache.org < >> bened...@apache.org> >> >> wrote: >> >> >> >>> I raised this before, but to highlight it again: how do these >> approaches >> >>> interface with our merge strategy? >> >>> >> >>> We might have to rebase several dependent merge commits and want to >> >>> merge them atomically. So far as I know these tools don’t work >> >>> fantastically in this scenario, but if I’m wrong that’s fantastic. If >> not, >> >>> given how important these things are, should we consider revisiting >> our >> >>> merge strategy? >> >>> >> >>> From: Joshua McKenzie <jmcken...@apache.org> >> >>> Date: Wednesday, 17 November 2021 at 16:39 >> >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >> >>> Subject: Re: [DISCUSS] Releasable trunk and quality >> >>> Thanks for the feedback and insight Henrik; it's valuable to hear how >> >>> other >> >>> large complex infra projects have tackled this problem set. >> >>> >> >>> To attempt to summarize, what I got from your email: >> >>> [Phase one] >> >>> 1) Build Barons: rotation where there's always someone active tying >> >>> failures to changes and adding those failures to our ticketing system >> >>> 2) Best effort process of "test breakers" being assigned tickets to >> fix >> >>> the >> >>> things their work broke >> >>> 3) Moving to a culture where we regularly revert commits that break >> tests >> >>> 4) Running tests before we merge changes >> >>> >> >>> [Phase two] >> >>> 1) Suite of performance tests on a regular cadence against trunk >> >>> (w/hunter >> >>> or otherwise) >> >>> 2) Integration w/ github merge-train pipelines >> >>> >> >>> That cover the highlights? I agree with these points as useful places >> for >> >>> us to invest in as a project and I'll work on getting this into a gdoc >> >>> for >> >>> us to align on and discuss further this week. >> >>> >> >>> ~Josh >> >>> >> >>> >> >>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo < >> henrik.i...@datastax.com> >> >>> wrote: >> >>> >> >>> > There's an old joke: How many people read Slashdot? The answer is 5. >> >>> The >> >>> > rest of us just write comments without reading... In that spirit, I >> >>> wanted >> >>> > to share some thoughts in response to your question, even if I know >> >>> some of >> >>> > it will have been said in this thread already :-) >> >>> > >> >>> > Basically, I just want to share what has worked well in my past >> >>> projects... >> >>> > >> >>> > Visualization: Now that we have Butler running, we can already see a >> >>> > decline in failing tests for 4.0 and trunk! This shows that >> >>> contributors >> >>> > want to do the right thing, we just need the right tools and >> processes >> >>> to >> >>> > achieve success. >> >>> > >> >>> > Process: I'm confident we will soon be back to seeing 0 failures for >> >>> 4.0 >> >>> > and trunk. However, keeping that state requires constant vigilance! >> At >> >>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...). >> This >> >>> is a >> >>> > weekly rotating role where the person who is the Build Baron will at >> >>> least >> >>> > once per day go through all of the Butler dashboards to catch new >> >>> > regressions early. We have used the same process also at Datastax to >> >>> guard >> >>> > our downstream fork of Cassandra 4.0. It's the responsibility of the >> >>> Build >> >>> > Baron to >> >>> > - file a jira ticket for new failures >> >>> > - determine which commit is responsible for introducing the >> >>> regression. >> >>> > Sometimes this is obvious, sometimes this requires "bisecting" by >> >>> running >> >>> > more builds e.g. between two nightly builds. >> >>> > - assign the jira ticket to the author of the commit that >> introduced >> >>> the >> >>> > regression >> >>> > >> >>> > Given that Cassandra is a community that includes part time and >> >>> volunteer >> >>> > developers, we may want to try some variation of this, such as >> pairing >> >>> 2 >> >>> > build barons each week? >> >>> > >> >>> > Reverting: A policy that the commit causing the regression is >> >>> automatically >> >>> > reverted can be scary. It takes courage to be the junior test >> engineer >> >>> who >> >>> > reverts yesterday's commit from the founder and CTO, just to give an >> >>> > example... Yet this is the most efficient way to keep the build >> green. >> >>> And >> >>> > it turns out it's not that much additional work for the original >> >>> author to >> >>> > fix the issue and then re-merge the patch. >> >>> > >> >>> > Merge-train: For any project with more than 1 commit per day, it >> will >> >>> > inevitably happen that you need to rebase a PR before merging, and >> >>> even if >> >>> > it passed all tests before, after rebase it won't. In the downstream >> >>> > Cassandra fork previously mentioned, we have tried to enable a >> github >> >>> rule >> >>> > which requires a) that all tests passed before merging, and b) the >> PR >> >>> is >> >>> > against the head of the branch merged into, and c) the tests were >> run >> >>> after >> >>> > such rebase. Unfortunately this leads to infinite loops where a >> large >> >>> PR >> >>> > may never be able to commit because it has to be rebased again and >> >>> again >> >>> > when smaller PRs can merge faster. The solution to this problem is >> to >> >>> have >> >>> > an automated process for the rebase-test-merge cycle. Gitlab >> supports >> >>> such >> >>> > a feature and calls it merge-trean: >> >>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html >> >>> > >> >>> > The merge-train can be considered an advanced feature and we can >> >>> return to >> >>> > it later. The other points should be sufficient to keep a reasonably >> >>> green >> >>> > trunk. >> >>> > >> >>> > I guess the major area where we can improve daily test coverage >> would >> >>> be >> >>> > performance tests. To that end we recently open sourced a nice tool >> >>> that >> >>> > can algorithmically detects performance regressions in a timeseries >> >>> history >> >>> > of benchmark results: https://github.com/datastax-labs/hunter Just >> >>> like >> >>> > with correctness testing it's my experience that catching >> regressions >> >>> the >> >>> > day they happened is much better than trying to do it at beta or rc >> >>> time. >> >>> > >> >>> > Piotr also blogged about Hunter when it was released: >> >>> > >> >>> > >> >>> >> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4 >> >>> > >> >>> > henrik >> >>> > >> >>> > >> >>> > >> >>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie < >> jmcken...@apache.org> >> >>> > wrote: >> >>> > >> >>> > > We as a project have gone back and forth on the topic of quality >> and >> >>> the >> >>> > > notion of a releasable trunk for quite a few years. If people are >> >>> > > interested, I'd like to rekindle this discussion a bit and see if >> >>> we're >> >>> > > happy with where we are as a project or if we think there's steps >> we >> >>> > should >> >>> > > take to change the quality bar going forward. The following >> questions >> >>> > have >> >>> > > been rattling around for me for awhile: >> >>> > > >> >>> > > 1. How do we define what "releasable trunk" means? All reviewed >> by M >> >>> > > committers? Passing N% of tests? Passing all tests plus some other >> >>> > metrics >> >>> > > (manual testing, raising the number of reviewers, test coverage, >> >>> usage in >> >>> > > dev or QA environments, etc)? Something else entirely? >> >>> > > >> >>> > > 2. With a definition settled upon in #1, what steps, if any, do we >> >>> need >> >>> > to >> >>> > > take to get from where we are to having *and keeping* that >> releasable >> >>> > > trunk? Anything to codify there? >> >>> > > >> >>> > > 3. What are the benefits of having a releasable trunk as defined >> >>> here? >> >>> > What >> >>> > > are the costs? Is it worth pursuing? What are the alternatives >> (for >> >>> > > instance: a freeze before a release + stabilization focus by the >> >>> > community >> >>> > > i.e. 4.0 push or the tock in tick-tock)? >> >>> > > >> >>> > > Given the large volumes of work coming down the pike with CEP's, >> this >> >>> > seems >> >>> > > like a good time to at least check in on this topic as a >> community. >> >>> > > >> >>> > > Full disclosure: running face-first into 60+ failing tests on >> trunk >> >>> when >> >>> > > going through the commit process for denylisting this week brought >> >>> this >> >>> > > topic back up for me (reminds me of when I went to merge CDC back >> in >> >>> 3.6 >> >>> > > and those test failures riled me up... I sense a pattern ;)) >> >>> > > >> >>> > > Looking forward to hearing what people think. >> >>> > > >> >>> > > ~Josh >> >>> > > >> >>> > >> >>> > >> >>> > -- >> >>> > >> >>> > Henrik Ingo >> >>> > >> >>> > +358 40 569 7354 <358405697354> >> >>> > >> >>> > [image: Visit us online.] <https://www.datastax.com/> [image: >> Visit >> >>> us on >> >>> > Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on >> >>> YouTube.] >> >>> > < >> >>> > >> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e= >> >>> > > >> >>> > [image: Visit my LinkedIn profile.] < >> >>> https://www.linkedin.com/in/heingo/ >> >>> > > >> >>> > >> >>> >> >> >> >