I confirm we're almost done with CASSANDRA-15580 (Repair QA testing). Nightly runs are scheduled already and I'm tuning some knobs to get them nicely stable: https://app.circleci.com/pipelines/github/riptano/cassandra-rtest?branch=trunk Andres also created an in-jvm dtest for the mixed cluster repair test that is under review.
Scope-wise, I'd suggest we keep the repair tests repo separate for now and work on integrating it into the Cassandra codebase post-4.0. It could as well be a separate repo altogether, depending on what would be the consensus here. I also had to reduce our ambitions on node density (from 100GB down to 20GB) due to how long the tests are taking already (almost 3 hours with Full/Incremental/Subrange running in parallel, so that's roughly 6 hours of AWS instance time per run when things go nicely). It's possible that the test backup has too much entropy, but it may be a good thing as I'd rather test smaller datasets with a lot of entropy rather than big ones with not much. It already allowed to uncover CASSANDRA-16406 <https://issues.apache.org/jira/browse/CASSANDRA-16406>. I'll update the tickets to reflect the current state. Le jeu. 21 janv. 2021 à 23:23, Scott Andreas <sc...@paradoxica.net> a écrit : > Thanks Benjamin! > > I propose we de-scope 15538 as the ticket does not currently have a clear > definition of done. Unless others disagree, we can remove the fix version > via lazy consensus in a couple days. That leaves us with a well-defined set > of tickets that are making progress. > > Re: the next question: > "Do you have a timeframe in mind for releasing 4.0 GA? Assuming that there > is no sudden burst in the number of issues." > > This is a great question for all on the list. Please consider what follows > as my interpretation of our current status relative to the project's > Release Lifecycle doc (and all "we/you" pronouns collective): > https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle > > We're currently meeting all criteria for the Beta phase except "No flaky > tests" and a small number of known bugs (eg., 16307, 16078). The good news > is we have the tickets in both categories identified (discussed earlier in > this thread), and they don't appear to be a large amount of work - > potentially with the exception of CASSANDRA-16078: Performance regression > for queries accessing multiple rows. The ticket reports a 39% perf > regression for queries fetching multiple rows in a partition via IN clauses > – a major regression that should block release until understood/fixed. > Caleb's working on this now. > > Once those issues and the validation epics that are now in review are > wrapped (which look like a few weeks' work if contributors can jump on the > flaky test tickets), we'll have met our criteria for graduating beta. > > The definition of an RC release is that any SHA we cut an RC build from > may legitimately be the SHA declared "Apache Cassandra 4.0.0." This is > where it gets real. When the project declares a build "RC," we're staking > our collective credibility on it and recommend that users upgrade to a > build that received this designation. > > I feel very good about where 4.0 is at. We've all surfaced and resolved a > large number of important issues. We've enhanced the project's testing > infrastructure to broaden the surface covered, which reduces the > probability of unknown unknowns. And we've collectively developed > toolchains for large-scale verification, including of existing live > clusters via diff. > > After beta’s complete, the next chasm to cross seems like our own > collective willingness to deploy and operate Cassandra 4.0 clusters in > production. Once we're at RC, willing to do so, and to recommend users do > the same, I think we'll have hit our definition of done. > > As we wrap up the remaining beta issues and flaky tests, now's a good time > for that RC gut check. If there's a remaining issue that would prevent you > from running trunk in a prod environment, please file it and raise > attention - it'll help us finish polishing the release. And if there isn't > - deploy it! > > We still need to finish the remaining bugs in scope and get tests reliably > green. But it feels good to be this close. > > – Scott > > ________________________________________ > From: Benjamin Lerer <benjamin.le...@datastax.com> > Sent: Tuesday, January 19, 2021 1:54 AM > To: dev@cassandra.apache.org > Subject: Re: [DISCUSS] Revisiting the quality testing epic scope > > Thank you for your reply, Scott. > > My understanding is that Alexander is moving forward on CASSANDRA-15580 > (Repair) and that Andres is focussing with Caleb on the tickets of > CASSANDRA-15579 (Distributed Read/Write Path). The biggest unknown here > seems to be CASSANDRA-16262 as you mentioned. > > Regarding CASSANDRA-15582 (Metrics), I shifted my focus toward helping with > reviews for the release candidate. By consequence, outside of 2 patches > created by Sumanth during the holidays, the epic has not been moving > forward. > > the silver lining is that it shouldn’t be long before the others wrap up. > > > > Do you have a timeframe in mind for releasing 4.0 GA? Assuming that there > is no sudden burst in the number of issues. > > We do have several flaky test tickets that could use attention, though > > > > I believe that Adam, Berenguer and Brandon have started focusing on them. > > On Sat, Jan 16, 2021 at 10:49 PM Scott Andreas <sc...@paradoxica.net> > wrote: > > > Thanks for raising the question, Benjamin! Notes on a few tickets inline > > below. > > > > Non-Blocking: > > – CASSANDRA-15537 Local Read/Write Path: Upgrade and Diff Test > > I think it’s reasonable to consider this ticket complete. Yifan and > others > > have worked to execute several dozen diff tests and while I’m sure others > > will continue, it’s reasonable to say cassandra-diff has been used to > > compare 3.0 vs. 4.0 clusters with a wide variety of data models. I’ll > check > > with Yifan on Tuesday re: updating the status of the ticket. It would be > > wonderful to hear of diff runs and experience from additional > contributors > > if others can share. > > > > – CASSANDRA-15584 Tooling - External Ecosystem > > Great collaboration on this one (including issues filed arising from this > > coverage, such as a recent ticket related to Medusa). > > > > Blocking GA: > > – CASSANDRA-15579 Distributed Read/Write Path > > The coordination and replication subtasks (16180, 16181) are making good > > progress. I’ll check with Caleb and David on 16262 (the fuzz testing > > subtask on Tuesday). > > > > – CASSANDRA-15581 Compaction > > Most of these are perf tests rather than development tasks, though the > > ones complete are listed as Patch Available. I’ll check with Yifan if > it’d > > make sense to move those for which no planned work remains to Resolved. I > > don’t think there’s a lot left here. > > > > – CASSANDRA-15538 Local Read/Write Path - Other Areas > > Will see if anything specific is planned, as scope is relatively > undefined. > > > > With the exception of 15538, most of these look to be moving along or > > nearly complete. I don’t think I’d shift others aside from it into the > > non-blocking category - but the silver lining is that it shouldn’t be > long > > before the others wrap up. > > > > We do have several flaky test tickets that could use attention, though — > > these may be quick to push through if anyone is able to pick them up: > > > > – CASSANDRA-16236: Fix flaky testTrackMaxDeletionTime > > – CASSANDRA-16238: Fix flaky test > > test_insert_data_during_replace_same_address - > > replace_address_test.TestReplaceAddress > > – CASSANDRA-16239: Fix flaky test > > org.apache.cassandra.distributed.test.NetstatsRepairStreamingTest > > testWithCompressionDisabled > > – CASSANDRA-16317: Fix flaky test incompleteCommit - > > org.apache.cassandra.distributed.test.CASTest > > – CASSANDRA-16355: Fix flaky test incompletePropose - > > org.apache.cassandra.distributed.test.CASTest > > – CASSANDRA-16382: Fix flaky > > LongSharedExecutorPoolTest.testPromptnessOfExecution > > – CASSANDRA-16358: Minor Flakiness in > > ProxyHandlerConnectionsTest#testExpireSomeFromBatch > > – CASSANDRA-16229: Flaky jvm-dtest: > > > org.apache.cassandra.distributed.test.ring.NodeNotInRingTest.nodeNotInRingTest > > – CASSANDRA-16061: > > > transient_replication_ring_test.py::TestTransientReplicationRing::test_move_forwards_and_cleanup > > > > Cheers, > > > > – Scott > > > > > On Jan 14, 2021, at 9:05 AM, Benjamin Lerer < > benjamin.le...@datastax.com> > > wrote: > > > > > > Hi everybody, > > > > > > As discussed before the holidays, it might make sense to revisit the > > scope > > > of the quality testing tickets for 4.0 GA to ensure that the 4.0 > release > > is > > > not held for longer than necessary. > > > > > > The current status of the quality testing tasks are the following: > > > > > > *DONE:* > > > > > > * CASSANDRA-15583 < > https://issues.apache.org/jira/browse/CASSANDRA-15583 > > > > > > Tooling, Bundled and First Party* > > > CASSANDRA-15586 <https://issues.apache.org/jira/browse/CASSANDRA-15586 > > > > > Cluster Setup and Maintenance > > > CASSANDRA-15587 <https://issues.apache.org/jira/browse/CASSANDRA-15587 > > > > > Platforms and Runtimes > > > > > > > > > *NON BLOCKING:* > > > > > > The goals of the following ticket have been reached. Once GA is closed > > they > > > will be marked as done. > > > > > > CASSANDRA-15537 <https://issues.apache.org/jira/browse/CASSANDRA-15537 > > > > > Local Read/Write Path: Upgrade and Diff Test > > > CASSANDRA-15584 <https://issues.apache.org/jira/browse/CASSANDRA-15584 > > > > > Tooling - External Ecosystem > > > > > > If I understood Jordan comment correctly on the following ticket, its > > > should also not be a blocker for 4.0 > > > CASSANDRA-15585 <https://issues.apache.org/jira/browse/CASSANDRA-15585 > > > > > Test Frameworks, Tooling, Infra / Automation > > > > > > *BLOCKING GA:* > > > > > > CASSANDRA-15579 <https://issues.apache.org/jira/browse/CASSANDRA-15579 > > > > > Distributed Read/Write Path > > > 4 sub-tasks: 1 resolved, 2 in progress, 1 open > > > > > > CASSANDRA-15580 <https://issues.apache.org/jira/browse/CASSANDRA-15580 > > > > > Repair > > > Test scenarios are ready, working on integrating them to circle-ci > > > > > > CASSANDRA-15581 <https://issues.apache.org/jira/browse/CASSANDRA-15581 > > > > > Compaction > > > 9 sub-tasks: 5 patch available, 1 review in progress, 3 triage > needed > > > > > > CASSANDRA-15582 <https://issues.apache.org/jira/browse/CASSANDRA-15582 > > > > > Metrics > > > 16 sub-tasks: 9 resolved, 5 patch available, 5 open > > > > > > CASSANDRA-15588 <https://issues.apache.org/jira/browse/CASSANDRA-15588 > > > > > Cluster Upgrade > > > 6 sub-tasks: 4 resolved, 1 in progress, 1 open > > > CASSANDRA-15538 <https://issues.apache.org/jira/browse/CASSANDRA-15538 > > > > > Local Read/Write Path No progress has been made on that ticket. The > > > conclusion so far is that Harry is our best choice to uncover issues in > > > that area but there is no clear plan on how to move forward. > > > We have made some progress across the quality testing tickets. > > Nevertheless > > > there is still a significant amount of tickets to fix. As our time and > > > resources are limited it might make sense to focus on what we believe > are > > > the most critical for 4.0 and relax our constraints on others. For > > example > > > it seems to me that the metrics tickets will mainly help to discover > non > > > critical old issues that are not blockers for 4.0. It is clear to me > that > > > they should be fixed but that could probably be done for the 4.0.x/4.1 > > > release (I fully volunteer for that :-)). The same could be true for > > some > > > other areas of the code. > > > > > > In my opinion the important questions we would need to answer are: > > > > > > 1. Are there some tickets that we should make non-blocking for 4.0 ? > > > 2. What do we do about CASSANDRA-15538 > > > <https://issues.apache.org/jira/browse/CASSANDRA-15538> Local > > Read/Write > > > Path? > > > > > > Thanks in advance for your feedback :-) > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >