This update comes to you from day 5 of quarantining in the basement. Thanks Pandemic. (╯°□°)╯︵ ┻━┻
(Today we're going to test if the ASF mailing lists allows a variety of ascii characters! I almost hope for everyone's sakes it doesn't; I abuse these things. :)) Let's start with 4.1: Latest run has 7 failures. If we dig a bit deeper into the detail panel (https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1), you can see that the CASTest failures in https://issues.apache.org/jira/browse/CASSANDRA-17461 account for the long pole blocking the release. Looks like there's multiple folks working on that (thanks Brandon, Benedict, Andres, and Berenguer!), but it also looks like there's still no assignee so we're maybe holding it at arms length. Either that or we're just going to keep dogpiling on it which is great too; I don't see it falling off the radar any time soon. testAutoSnapshotTTIOnDropAfterRestart has failed a few times so there's some legit flake there: https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.distributed.test/AutoSnapshotTtlTest/testAutoSnapshotTTlOnDropAfterRestart_2/. No build lead lately so we don't have a JIRA for it or associated with it (https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252); I may put that mantle back on in the near future. There are 2 other failures that push us up to 7: 1) org.apache.cassandra.distributed.test.RepairTest.testForcedNormalRepairWithOneNodeDown (https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.distributed.test/RepairTest/testForcedNormalRepairWithOneNodeDown/). Looks like not all endpoints replied to the repair request so probably worth trying to repro locally and troubleshoot. 2) org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome (https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.net/ProxyHandlerConnectionsTest/testExpireSome_2/). This is a timeout, so it's anyone's guess. :) Holistically, if we take a step back and look at 4.1 from a distance as to its general CI health, there's quite a bit of flake there: https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1. If we toss out build 122 as an anomaly, there's many that fail once in the past 16 runs. This continues to highlight for me the tradeoff of re-running flakes vs. blocking on them. We've chatted a bit on other ML threads on this; while prepping this email I just noticed that the high level dashboard on our branches shows varied and pervasive flakiness which is particularly challenging. Getting to 0 flakes with a "run once 0 tolerance" policy with the current ASF CI infra (which is definitely being improved upon!) looks to be something of a Sisyphean task. We're down from 13 tickets blocking 4.1 beta down to 7: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455. As mentioned above, we have some test failures w/out tickets so that 7 is probably closer realistically to the previous count. We have one unassigned ticket blocking 4.1 if anyone wants to pick it up: https://issues.apache.org/jira/browse/CASSANDRA-17773 (Incorrect cassandra.logdir on Debian systems). [New Contributors Getting Started] Follow your curiosity! We have a small number of things that still need to be fixed blocking 4.1, but if you have something specific you're interested in and there's an open ticket in jira on Cassandra, feel free to ping in slack (see below) to see if there's any context you need to dive in and get yourself assigned to that ticket. Hit up the @cassandra_mentors alias to reach volunteers who are available to help you get situated and link up with you as a mentor. To search JIRA for a topic of interest, replace "ReplaceTextHere" with the topic on the following JIRA search: https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20AND%20resolution%20!%3D%20unresolved%20AND%20assignee%20is%20EMPTY%20AND%20summary%20~%20%27ReplaceTextHere%27%20ORDER%20BY%20priority%20ASC To get situated, here's an explanation of various types of contribution: https://cassandra.apache.org/_/community.html#how-to-contribute An overview of the C* architecture: https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html And here's our getting started contributing guide: https://cassandra.apache.org/_/development/index.html We hang out in #cassandra-dev on https://the-asf.slack.com so come join us. [Dev list Digest] https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w: We've had a fairly active couple of weeks. Caleb is shopping for feedback on what we do with hints during decommission: https://lists.apache.org/thread/0o2kd2hntbdjhpf8t1j9l9ys7k7y1wo5. See CASSANDRA-17808 for more details: https://issues.apache.org/jira/browse/CASSANDRA-17808 Claude brought up the state of our open pull requests (so many that are open and stale) and the optics and inclusivity of our current MO: https://lists.apache.org/thread/7r6wd2p8kyz0g7rw2mnlw411gdmymlld. I'll refrain from further editorializing here as I've shared my perspective on the proposal thread; thanks Claude for bringing that up! Claude later brought up a formal proposal to add a pull request template: https://lists.apache.org/thread/bwogjbpmwxd7qongq86lcv03ljqq83ps to the project. Mick put forward the proposal to move our official debian and redhat repositories from downloads.apache.org to a redirect to apache.jfrog.io: https://lists.apache.org/thread/09kj80xld5dkt7cv73m6xs56lqh4jd18 In pursuit of CEP-15: Accord and multi-key transactions, Caleb is working on the syntax discussed in a previous thread. https://lists.apache.org/thread/6p2flc3ql14nkn76m3dp1cldmqx0kz96, see https://issues.apache.org/jira/browse/CASSANDRA-17719 as well for details. Patrick shared quite a few thoughts a few days ago; curious to see what others think. [CI Trends] https://butler.cassandra.apache.org/#/ Here's our trends on our branches for the last two weeks: 3.0: 14 -> 11 3.11: 17 -> 20 4.0: 6 -> 5 4.1: 4 -> 7 trunk: 7 -> 8 Most of the consistent failures on 3.0 don't have assignees yet but do have JIRA tickets: https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-3.0/cassandra-3.0 (Thanks Brandon for creating those JIRAs) [Release progress] https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2175 Going to try a new "editorialized changes.txt" style format here: 4.1 beta: 11 issues - Fixed documentation surrounding semantics of token ranges in nodetool compact (CASSANDRA-17575) - A variety of flaky test failures - build and packaging fixes (CASSANDRA-17766 and 17765) - Fixing clientInitialization setting the failure detector (CASSANDRA-17782) - Fixing BulkLoader initializing schema via streaming (CASSANDRA-17740) 4.X / Next: 17 issues - New Guardrail for column sizes added (CASSANDRA-17151) - A fix to add an additional check that a node being replaced is reported as live so we don't fail incorrectly (CASSANDRA-17805) - Added the ability to do a one-time heap dump to a file on an Unhandled Exception, configurable by file or JMX (CASSANDRA-17795) - Added a separate thread pool that handles high cost auth responses so new client connections don't overwhelm bcrypt (CASSANDRA-17812) - A pretty significant improvement in DataOutputBuffer's memory usage and GC pressure (CASSANDRA-16471) - The ability to read TTL and WRITE TIME of an element in a collection added (CASSANDRA-8877) - Some cleaning up of python linting and legacy code fragments (CASSANDRA-17694, CASSANDRA-17779) - A bug causing an NPE during streaming fixed (CASSANDRA-17801) - Upstream CEP-15 work, using a seeded crc for PaxosBallotTracker checksum (CASSANDRA-17793) - UUID for tracking nodeool import logging (CASSANDRA-17800) - lack of JNA not exploding things when running as a client (CASSANDRA-17794) - Skipping node kill on startup check for unknown things in system since that can happen if you upgrade from older versions of C* up (CASSANDRA-17777) - Logging duplicate keys if they show up during verify (CASSANDRA-17789) - Breaking out secondary index building to its own thread pool so it doesn't block compaction in (CASSANDRA-17781) So to sum it up: - CASTest continues to be the biggest block on 4.1: https://issues.apache.org/jira/browse/CASSANDRA-17461 but folks are working on it - The lack of a consistent build lead means our Kanban tracking test fixes is drifting further from the state of CI - It's pretty expensive and painful to defer cleaning up CI to the end of the release cycle Keep fighting the good fight! ~Josh