One correction, testAutoSnapshotTTIOnDropAfterRestart - ticket in review already, it wasn’t linked in Butler though. I will link it now. Thanks Paulo and Caleb for looking into it.
On Wed, 17 Aug 2022 at 14:05, Josh McKenzie <jmcken...@apache.org> wrote: > This update comes to you from day 5 of quarantining in the basement. > Thanks Pandemic. (╯°□°)╯︵ ┻━┻ > > (Today we're going to test if the ASF mailing lists allows a variety of > ascii characters! I almost hope for everyone's sakes it doesn't; I abuse > these things. :)) > > Let's start with 4.1: > Latest run has 7 failures. If we dig a bit deeper into the detail panel ( > https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1), > you can see that the CASTest failures in > https://issues.apache.org/jira/browse/CASSANDRA-17461 account for the > long pole blocking the release. Looks like there's multiple folks working > on that (thanks Brandon, Benedict, Andres, and Berenguer!), but it also > looks like there's still no assignee so we're maybe holding it at arms > length. Either that or we're just going to keep dogpiling on it which is > great too; I don't see it falling off the radar any time soon. > > has failed a few times so there's some legit flake there: > https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.distributed.test/AutoSnapshotTtlTest/testAutoSnapshotTTlOnDropAfterRestart_2/. > No build lead lately so we don't have a JIRA for it or associated with it ( > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252); > I may put that mantle back on in the near future. > > There are 2 other failures that push us up to 7: > 1) > org.apache.cassandra.distributed.test.RepairTest.testForcedNormalRepairWithOneNodeDown > ( > https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.distributed.test/RepairTest/testForcedNormalRepairWithOneNodeDown/). > Looks like not all endpoints replied to the repair request so probably > worth trying to repro locally and troubleshoot. > > 2) org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome ( > https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.net/ProxyHandlerConnectionsTest/testExpireSome_2/). > This is a timeout, so it's anyone's guess. :) > > Holistically, if we take a step back and look at 4.1 from a distance as to > its general CI health, there's quite a bit of flake there: > https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1. > If we toss out build 122 as an anomaly, there's many that fail once in the > past 16 runs. This continues to highlight for me the tradeoff of re-running > flakes vs. blocking on them. We've chatted a bit on other ML threads on > this; while prepping this email I just noticed that the high level > dashboard on our branches shows varied and pervasive flakiness which is > particularly challenging. > > Getting to 0 flakes with a "run once 0 tolerance" policy with the current > ASF CI infra (which is definitely being improved upon!) looks to be > something of a Sisyphean task. > > We're down from 13 tickets blocking 4.1 beta down to 7: > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455. > As mentioned above, we have some test failures w/out tickets so that 7 is > probably closer realistically to the previous count. > > We have one unassigned ticket blocking 4.1 if anyone wants to pick it up: > https://issues.apache.org/jira/browse/CASSANDRA-17773 (Incorrect > cassandra.logdir on Debian systems). > > > [New Contributors Getting Started] > Follow your curiosity! We have a small number of things that still need to > be fixed blocking 4.1, but if you have something specific you're interested > in and there's an open ticket in jira on Cassandra, feel free to ping in > slack (see below) to see if there's any context you need to dive in and get > yourself assigned to that ticket. Hit up the @cassandra_mentors alias to > reach volunteers who are available to help you get situated and link up > with you as a mentor. > > To search JIRA for a topic of interest, replace "ReplaceTextHere" with the > topic on the following JIRA search: > https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20AND%20resolution%20!%3D%20unresolved%20AND%20assignee%20is%20EMPTY%20AND%20summary%20~%20%27ReplaceTextHere%27%20ORDER%20BY%20priority%20ASC > > To get situated, here's an explanation of various types of contribution: > https://cassandra.apache.org/_/community.html#how-to-contribute > An overview of the C* architecture: > https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html > And here's our getting started contributing guide: > https://cassandra.apache.org/_/development/index.html > We hang out in #cassandra-dev on https://the-asf.slack.com so come join > us. > > > [Dev list Digest] > https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w: > > We've had a fairly active couple of weeks. Caleb is shopping for feedback > on what we do with hints during decommission: > https://lists.apache.org/thread/0o2kd2hntbdjhpf8t1j9l9ys7k7y1wo5. See > CASSANDRA-17808 for more details: > https://issues.apache.org/jira/browse/CASSANDRA-17808 > > Claude brought up the state of our open pull requests (so many that are > open and stale) and the optics and inclusivity of our current MO: > https://lists.apache.org/thread/7r6wd2p8kyz0g7rw2mnlw411gdmymlld. I'll > refrain from further editorializing here as I've shared my perspective on > the proposal thread; thanks Claude for bringing that up! > > Claude later brought up a formal proposal to add a pull request template: > https://lists.apache.org/thread/bwogjbpmwxd7qongq86lcv03ljqq83ps to the > project. > > Mick put forward the proposal to move our official debian and redhat > repositories from downloads.apache.org to a redirect to apache.jfrog.io: > https://lists.apache.org/thread/09kj80xld5dkt7cv73m6xs56lqh4jd18 > > In pursuit of CEP-15: Accord and multi-key transactions, Caleb is working > on the syntax discussed in a previous thread. > https://lists.apache.org/thread/6p2flc3ql14nkn76m3dp1cldmqx0kz96, see > https://issues.apache.org/jira/browse/CASSANDRA-17719 as well for > details. Patrick shared quite a few thoughts a few days ago; curious to see > what others think. > > > [CI Trends] > https://butler.cassandra.apache.org/#/ > > Here's our trends on our branches for the last two weeks: > > 3.0: 14 -> 11 > 3.11: 17 -> 20 > 4.0: 6 -> 5 > 4.1: 4 -> 7 > trunk: 7 -> 8 > > Most of the consistent failures on 3.0 don't have assignees yet but do > have JIRA tickets: > https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-3.0/cassandra-3.0 > (Thanks Brandon for creating those JIRAs) > > > [Release progress] > > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2175 > > Going to try a new "editorialized changes.txt" style format here: > > 4.1 beta: 11 issues > - Fixed documentation surrounding semantics of token ranges in nodetool > compact (CASSANDRA-17575) > - A variety of flaky test failures > - build and packaging fixes (CASSANDRA-17766 and 17765) > - Fixing clientInitialization setting the failure detector > (CASSANDRA-17782) > - Fixing BulkLoader initializing schema via streaming (CASSANDRA-17740) > > 4.X / Next: 17 issues > - New Guardrail for column sizes added (CASSANDRA-17151) > - A fix to add an additional check that a node being replaced is reported > as live so we don't fail incorrectly (CASSANDRA-17805) > - Added the ability to do a one-time heap dump to a file on an Unhandled > Exception, configurable by file or JMX (CASSANDRA-17795) > - Added a separate thread pool that handles high cost auth responses so > new client connections don't overwhelm bcrypt (CASSANDRA-17812) > - A pretty significant improvement in DataOutputBuffer's memory usage and > GC pressure (CASSANDRA-16471) > - The ability to read TTL and WRITE TIME of an element in a collection > added (CASSANDRA-8877) > - Some cleaning up of python linting and legacy code fragments > (CASSANDRA-17694, CASSANDRA-17779) > - A bug causing an NPE during streaming fixed (CASSANDRA-17801) > - Upstream CEP-15 work, using a seeded crc for PaxosBallotTracker checksum > (CASSANDRA-17793) > - UUID for tracking nodeool import logging (CASSANDRA-17800) > - lack of JNA not exploding things when running as a client > (CASSANDRA-17794) > - Skipping node kill on startup check for unknown things in system since > that can happen if you upgrade from older versions of C* up > (CASSANDRA-17777) > - Logging duplicate keys if they show up during verify (CASSANDRA-17789) > - Breaking out secondary index building to its own thread pool so it > doesn't block compaction in (CASSANDRA-17781) > > So to sum it up: > - CASTest continues to be the biggest block on 4.1: > https://issues.apache.org/jira/browse/CASSANDRA-17461 but folks are > working on it > - The lack of a consistent build lead means our Kanban tracking test fixes > is drifting further from the state of CI > - It's pretty expensive and painful to defer cleaning up CI to the end of > the release cycle > > Keep fighting the good fight! > > > > ~Josh >