+1. I would add a 'post-commit' step: check the jenkins CI run for your merge and see if sthg broke regardless.
On 6/12/21 23:51, Ekaterina Dimitrova wrote: > Hi Josh, > All good questions, thank you for raising this topic. > To the best of my knowledge, we don't have those documented but I will put > notes on what tribal knowledge I know about and I personally follow :-) > > Pre-commit test suites: * Which JDK's? - both are officially supported so > both. > > * When to include all python tests or do JVM only (if ever)? - if I test > only a test fix probably > > * When to run upgrade tests? - I haven't heard any definitive guideline. > Preferably every time but if there is a tiny change I guess it can be > decided for them to be skipped. I would advocate to do more than less. > > * What to do if a test is also failing on the reference root (i.e. trunk, > cassandra-4.0, etc)? - check if a ticket exists already, if not - open one > at least, even if I don't plan to work on it at least to acknowledge > the issue and add any info I know about. If we know who broke it, ping the > author to check it. > > * What to do if a test fails intermittently? - Open a ticket. During > investigation - Use the CircleCI jobs for running tests in a loop to find > when it fails or to verify the test was fixed. (This is already in my draft > CircleCI document, not yet released as it was pending on the documents > migration.) > > Hope that helps. > > ~Ekaterina > > On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie <jmcken...@apache.org> wrote: > >> As I work through the scripting on this, I don't know if we've documented >> or clarified the following (don't see it here: >> https://cassandra.apache.org/_/development/testing.html): >> >> Pre-commit test suites: >> * Which JDK's? >> * When to include all python tests or do JVM only (if ever)? >> * When to run upgrade tests? >> * What to do if a test is also failing on the reference root (i.e. trunk, >> cassandra-4.0, etc)? >> * What to do if a test fails intermittently? >> >> I'll also update the above linked documentation once we hammer this out and >> try and bake it into the scripting flow as much as possible as well. Goal >> is to make it easy to do the right thing and hard to do the wrong thing, >> and to have these things written down rather than have it be tribal >> knowledge that varies a lot across the project. >> >> ~Josh >> >> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jmcken...@apache.org> >> wrote: >> >>> After some offline collab, here's where this thread has landed on a >>> proposal to change our processes to incrementally improve our processes >> and >>> hopefully stabilize the state of CI longer term: >>> >>> Link: >>> >> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4 >>> Hopefully the mail server doesn't butcher formatting; if it does, hit up >>> the gdoc and leave comments there as should be open to all. >>> >>> Phase 1: >>> Document merge criteria; update circle jobs to have a simple pre-merge >> job >>> (one for each JDK profile) >>> * Donate, document, and formalize usage of circleci-enable.py in ASF >>> repo (need new commit scripts / dev tooling section?) >>> * rewrites circle config jobs to simple clear flow >>> * ability to toggle between "run on push" or "click to run" >>> * Variety of other functionality; see below >>> Document (site, help, README.md) and automate via scripting the >>> relationship / dev / release process around: >>> * In-jvm dtest >>> * dtest >>> * ccm >>> Integrate and document usage of script to build CI repeat test runs >>> * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest >>> * Document “Do this if you add or change tests” >>> Introduce “Build Lead” role >>> * Weekly rotation; volunteer >>> * 1: Make sure JIRAs exist for test failures >>> * 2: Attempt to triage new test failures to root cause and assign out >>> * 3: Coordinate and drive to green board on trunk >>> Change and automate process for *trunk only* patches: >>> * Block on green CI (from merge criteria in CI above; potentially >>> stricter definition of "clean" for trunk CI) >>> * Consider using github PR’s to merge (TODO: determine how to handle >>> circle + CHANGES; see below) >>> Automate process for *multi-branch* merges >>> * Harden / contribute / document dcapwell script (has one which does >>> the following): >>> * rebases your branch to the latest (if on 3.0 then rebase >> against >>> cassandra-3.0) >>> * check compiles >>> * removes all changes to .circle (can opt-out for circleci >> patches) >>> * removes all changes to CHANGES.txt and leverages JIRA for the >>> content >>> * checks code still compiles >>> * changes circle to run ci >>> * push to a temp branch in git and run CI (circle + Jenkins) >>> * when all branches are clean (waiting step is manual) >>> * TODO: Define “clean” >>> * No new test failures compared to reference? >>> * Or no test failures at all? >>> * merge changes into the actual branches >>> * merge up changes; rewriting diff >>> * push --atomic >>> >>> Transition to phase 2 when: >>> * All items from phase 1 are complete >>> * Test boards for supported branches are green >>> >>> Phase 2: >>> * Add Harry to recurring run against trunk >>> * Add Harry to release pipeline >>> * Suite of perf tests against trunk recurring >>> >>> >>> >>> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jmcken...@apache.org> >>> wrote: >>> >>>> Sorry for not catching that Benedict, you're absolutely right. So long >> as >>>> we're using merge commits between branches I don't think auto-merging >> via >>>> train or blocking on green CI are options via the tooling, and >> multi-branch >>>> reverts will be something we should document very clearly should we even >>>> choose to go that route (a lot of room to make mistakes there). >>>> >>>> It may not be a huge issue as we can expect the more disruptive changes >>>> (i.e. potentially destabilizing) to be happening on trunk only, so >> perhaps >>>> we can get away with slightly different workflows or policies based on >>>> whether you're doing a multi-branch bugfix or a feature on trunk. Bears >>>> thinking more deeply about. >>>> >>>> I'd also be game for revisiting our merge strategy. I don't see much >>>> difference in labor between merging between branches vs. preparing >> separate >>>> patches for an individual developer, however I'm sure there's >> maintenance >>>> and integration implications there I'm not thinking of right now. >>>> >>>> On Wed, Nov 17, 2021 at 12:03 PM bened...@apache.org < >> bened...@apache.org> >>>> wrote: >>>> >>>>> I raised this before, but to highlight it again: how do these >> approaches >>>>> interface with our merge strategy? >>>>> >>>>> We might have to rebase several dependent merge commits and want to >>>>> merge them atomically. So far as I know these tools don’t work >>>>> fantastically in this scenario, but if I’m wrong that’s fantastic. If >> not, >>>>> given how important these things are, should we consider revisiting our >>>>> merge strategy? >>>>> >>>>> From: Joshua McKenzie <jmcken...@apache.org> >>>>> Date: Wednesday, 17 November 2021 at 16:39 >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >>>>> Subject: Re: [DISCUSS] Releasable trunk and quality >>>>> Thanks for the feedback and insight Henrik; it's valuable to hear how >>>>> other >>>>> large complex infra projects have tackled this problem set. >>>>> >>>>> To attempt to summarize, what I got from your email: >>>>> [Phase one] >>>>> 1) Build Barons: rotation where there's always someone active tying >>>>> failures to changes and adding those failures to our ticketing system >>>>> 2) Best effort process of "test breakers" being assigned tickets to fix >>>>> the >>>>> things their work broke >>>>> 3) Moving to a culture where we regularly revert commits that break >> tests >>>>> 4) Running tests before we merge changes >>>>> >>>>> [Phase two] >>>>> 1) Suite of performance tests on a regular cadence against trunk >>>>> (w/hunter >>>>> or otherwise) >>>>> 2) Integration w/ github merge-train pipelines >>>>> >>>>> That cover the highlights? I agree with these points as useful places >> for >>>>> us to invest in as a project and I'll work on getting this into a gdoc >>>>> for >>>>> us to align on and discuss further this week. >>>>> >>>>> ~Josh >>>>> >>>>> >>>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <henrik.i...@datastax.com >>>>> wrote: >>>>> >>>>>> There's an old joke: How many people read Slashdot? The answer is 5. >>>>> The >>>>>> rest of us just write comments without reading... In that spirit, I >>>>> wanted >>>>>> to share some thoughts in response to your question, even if I know >>>>> some of >>>>>> it will have been said in this thread already :-) >>>>>> >>>>>> Basically, I just want to share what has worked well in my past >>>>> projects... >>>>>> Visualization: Now that we have Butler running, we can already see a >>>>>> decline in failing tests for 4.0 and trunk! This shows that >>>>> contributors >>>>>> want to do the right thing, we just need the right tools and >> processes >>>>> to >>>>>> achieve success. >>>>>> >>>>>> Process: I'm confident we will soon be back to seeing 0 failures for >>>>> 4.0 >>>>>> and trunk. However, keeping that state requires constant vigilance! >> At >>>>>> Mongodb we had a role called Build Baron (aka Build Cop, etc...). >> This >>>>> is a >>>>>> weekly rotating role where the person who is the Build Baron will at >>>>> least >>>>>> once per day go through all of the Butler dashboards to catch new >>>>>> regressions early. We have used the same process also at Datastax to >>>>> guard >>>>>> our downstream fork of Cassandra 4.0. It's the responsibility of the >>>>> Build >>>>>> Baron to >>>>>> - file a jira ticket for new failures >>>>>> - determine which commit is responsible for introducing the >>>>> regression. >>>>>> Sometimes this is obvious, sometimes this requires "bisecting" by >>>>> running >>>>>> more builds e.g. between two nightly builds. >>>>>> - assign the jira ticket to the author of the commit that introduced >>>>> the >>>>>> regression >>>>>> >>>>>> Given that Cassandra is a community that includes part time and >>>>> volunteer >>>>>> developers, we may want to try some variation of this, such as >> pairing >>>>> 2 >>>>>> build barons each week? >>>>>> >>>>>> Reverting: A policy that the commit causing the regression is >>>>> automatically >>>>>> reverted can be scary. It takes courage to be the junior test >> engineer >>>>> who >>>>>> reverts yesterday's commit from the founder and CTO, just to give an >>>>>> example... Yet this is the most efficient way to keep the build >> green. >>>>> And >>>>>> it turns out it's not that much additional work for the original >>>>> author to >>>>>> fix the issue and then re-merge the patch. >>>>>> >>>>>> Merge-train: For any project with more than 1 commit per day, it will >>>>>> inevitably happen that you need to rebase a PR before merging, and >>>>> even if >>>>>> it passed all tests before, after rebase it won't. In the downstream >>>>>> Cassandra fork previously mentioned, we have tried to enable a github >>>>> rule >>>>>> which requires a) that all tests passed before merging, and b) the PR >>>>> is >>>>>> against the head of the branch merged into, and c) the tests were run >>>>> after >>>>>> such rebase. Unfortunately this leads to infinite loops where a large >>>>> PR >>>>>> may never be able to commit because it has to be rebased again and >>>>> again >>>>>> when smaller PRs can merge faster. The solution to this problem is to >>>>> have >>>>>> an automated process for the rebase-test-merge cycle. Gitlab supports >>>>> such >>>>>> a feature and calls it merge-trean: >>>>>> https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html >>>>>> >>>>>> The merge-train can be considered an advanced feature and we can >>>>> return to >>>>>> it later. The other points should be sufficient to keep a reasonably >>>>> green >>>>>> trunk. >>>>>> >>>>>> I guess the major area where we can improve daily test coverage would >>>>> be >>>>>> performance tests. To that end we recently open sourced a nice tool >>>>> that >>>>>> can algorithmically detects performance regressions in a timeseries >>>>> history >>>>>> of benchmark results: https://github.com/datastax-labs/hunter Just >>>>> like >>>>>> with correctness testing it's my experience that catching regressions >>>>> the >>>>>> day they happened is much better than trying to do it at beta or rc >>>>> time. >>>>>> Piotr also blogged about Hunter when it was released: >>>>>> >>>>>> >> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4 >>>>>> henrik >>>>>> >>>>>> >>>>>> >>>>>> On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie < >> jmcken...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> We as a project have gone back and forth on the topic of quality >> and >>>>> the >>>>>>> notion of a releasable trunk for quite a few years. If people are >>>>>>> interested, I'd like to rekindle this discussion a bit and see if >>>>> we're >>>>>>> happy with where we are as a project or if we think there's steps >> we >>>>>> should >>>>>>> take to change the quality bar going forward. The following >> questions >>>>>> have >>>>>>> been rattling around for me for awhile: >>>>>>> >>>>>>> 1. How do we define what "releasable trunk" means? All reviewed by >> M >>>>>>> committers? Passing N% of tests? Passing all tests plus some other >>>>>> metrics >>>>>>> (manual testing, raising the number of reviewers, test coverage, >>>>> usage in >>>>>>> dev or QA environments, etc)? Something else entirely? >>>>>>> >>>>>>> 2. With a definition settled upon in #1, what steps, if any, do we >>>>> need >>>>>> to >>>>>>> take to get from where we are to having *and keeping* that >> releasable >>>>>>> trunk? Anything to codify there? >>>>>>> >>>>>>> 3. What are the benefits of having a releasable trunk as defined >>>>> here? >>>>>> What >>>>>>> are the costs? Is it worth pursuing? What are the alternatives (for >>>>>>> instance: a freeze before a release + stabilization focus by the >>>>>> community >>>>>>> i.e. 4.0 push or the tock in tick-tock)? >>>>>>> >>>>>>> Given the large volumes of work coming down the pike with CEP's, >> this >>>>>> seems >>>>>>> like a good time to at least check in on this topic as a community. >>>>>>> >>>>>>> Full disclosure: running face-first into 60+ failing tests on trunk >>>>> when >>>>>>> going through the commit process for denylisting this week brought >>>>> this >>>>>>> topic back up for me (reminds me of when I went to merge CDC back >> in >>>>> 3.6 >>>>>>> and those test failures riled me up... I sense a pattern ;)) >>>>>>> >>>>>>> Looking forward to hearing what people think. >>>>>>> >>>>>>> ~Josh >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Henrik Ingo >>>>>> >>>>>> +358 40 569 7354 <358405697354> >>>>>> >>>>>> [image: Visit us online.] <https://www.datastax.com/> [image: Visit >>>>> us on >>>>>> Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on >>>>> YouTube.] >>>>>> < >>>>>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e= >>>>>> [image: Visit my LinkedIn profile.] < >>>>> https://www.linkedin.com/in/heingo/ --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org