This update comes to you from day 5 of quarantining in the basement. Thanks 
Pandemic. (╯°□°)╯︵ ┻━┻

(Today we're going to test if the ASF mailing lists allows a variety of ascii 
characters! I almost hope for everyone's sakes it doesn't; I abuse these 
things. :))

Let's start with 4.1:
Latest run has 7 failures. If we dig a bit deeper into the detail panel 
(https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1),
 you can see that the CASTest failures in 
https://issues.apache.org/jira/browse/CASSANDRA-17461 account for the long pole 
blocking the release. Looks like there's multiple folks working on that (thanks 
Brandon, Benedict, Andres, and Berenguer!), but it also looks like there's 
still no assignee so we're maybe holding it at arms length. Either that or 
we're just going to keep dogpiling on it which is great too; I don't see it 
falling off the radar any time soon.

testAutoSnapshotTTIOnDropAfterRestart has failed a few times so there's some 
legit flake there: 
https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.distributed.test/AutoSnapshotTtlTest/testAutoSnapshotTTlOnDropAfterRestart_2/.
 No build lead lately so we don't have a JIRA for it or associated with it 
(https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252);
 I may put that mantle back on in the near future.

There are 2 other failures that push us up to 7:
1) 
org.apache.cassandra.distributed.test.RepairTest.testForcedNormalRepairWithOneNodeDown
 
(https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.distributed.test/RepairTest/testForcedNormalRepairWithOneNodeDown/).
 Looks like not all endpoints replied to the repair request so probably worth 
trying to repro locally and troubleshoot.

2) org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome 
(https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.net/ProxyHandlerConnectionsTest/testExpireSome_2/).
 This is a timeout, so it's anyone's guess. :)

Holistically, if we take a step back and look at 4.1 from a distance as to its 
general CI health, there's quite a bit of flake there: 
https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1.
 If we toss out build 122 as an anomaly, there's many that fail once in the 
past 16 runs. This continues to highlight for me the tradeoff of re-running 
flakes vs. blocking on them. We've chatted a bit on other ML threads on this; 
while prepping this email I just noticed that the high level dashboard on our 
branches shows varied and pervasive flakiness which is particularly challenging.

Getting to 0 flakes with a "run once 0 tolerance" policy with the current ASF 
CI infra (which is definitely being improved upon!) looks to be something of a 
Sisyphean task. 

We're down from 13 tickets blocking 4.1 beta down to 7: 
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455.
 As mentioned above, we have some test failures w/out tickets so that 7 is 
probably closer realistically to the previous count.

We have one unassigned ticket blocking 4.1 if anyone wants to pick it up: 
https://issues.apache.org/jira/browse/CASSANDRA-17773 (Incorrect 
cassandra.logdir on Debian systems).


[New Contributors Getting Started]
Follow your curiosity! We have a small number of things that still need to be 
fixed blocking 4.1, but if you have something specific you're interested in and 
there's an open ticket in jira on Cassandra, feel free to ping in slack (see 
below) to see if there's any context you need to dive in and get yourself 
assigned to that ticket. Hit up the @cassandra_mentors alias to reach 
volunteers who are available to help you get situated and link up with you as a 
mentor.

To search JIRA for a topic of interest, replace "ReplaceTextHere" with the 
topic on the following JIRA search: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20AND%20resolution%20!%3D%20unresolved%20AND%20assignee%20is%20EMPTY%20AND%20summary%20~%20%27ReplaceTextHere%27%20ORDER%20BY%20priority%20ASC

To get situated, here's an explanation of various types of contribution: 
https://cassandra.apache.org/_/community.html#how-to-contribute
An overview of the C* architecture: 
https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
And here's our getting started contributing guide: 
https://cassandra.apache.org/_/development/index.html
We hang out in #cassandra-dev on https://the-asf.slack.com so come join us.


[Dev list Digest]
https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w:

We've had a fairly active couple of weeks. Caleb is shopping for feedback on 
what we do with hints during decommission: 
https://lists.apache.org/thread/0o2kd2hntbdjhpf8t1j9l9ys7k7y1wo5. See 
CASSANDRA-17808 for more details: 
https://issues.apache.org/jira/browse/CASSANDRA-17808

Claude brought up the state of our open pull requests (so many that are open 
and stale) and the optics and inclusivity of our current MO: 
https://lists.apache.org/thread/7r6wd2p8kyz0g7rw2mnlw411gdmymlld. I'll refrain 
from further editorializing here as I've shared my perspective on the proposal 
thread; thanks Claude for bringing that up!

Claude later brought up a formal proposal to add a pull request template: 
https://lists.apache.org/thread/bwogjbpmwxd7qongq86lcv03ljqq83ps to the 
project. 

Mick put forward the proposal to move our official debian and redhat 
repositories from downloads.apache.org to a redirect to apache.jfrog.io: 
https://lists.apache.org/thread/09kj80xld5dkt7cv73m6xs56lqh4jd18

In pursuit of CEP-15: Accord and multi-key transactions, Caleb is working on 
the syntax discussed in a previous thread. 
https://lists.apache.org/thread/6p2flc3ql14nkn76m3dp1cldmqx0kz96, see 
https://issues.apache.org/jira/browse/CASSANDRA-17719 as well for details. 
Patrick shared quite a few thoughts a few days ago; curious to see what others 
think.


[CI Trends]
https://butler.cassandra.apache.org/#/

Here's our trends on our branches for the last two weeks:

3.0: 14 -> 11
3.11: 17 -> 20
4.0: 6 -> 5
4.1: 4 -> 7
trunk: 7 -> 8

Most of the consistent failures on 3.0 don't have assignees yet but do have 
JIRA tickets: 
https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-3.0/cassandra-3.0
 (Thanks Brandon for creating those JIRAs)


[Release progress]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2175

Going to try a new "editorialized changes.txt" style format here:

4.1 beta: 11 issues
- Fixed documentation surrounding semantics of token ranges in nodetool compact 
(CASSANDRA-17575)
- A variety of flaky test failures
- build and packaging fixes (CASSANDRA-17766 and 17765)
- Fixing clientInitialization setting the failure detector (CASSANDRA-17782)
- Fixing BulkLoader initializing schema via streaming (CASSANDRA-17740)

4.X / Next: 17 issues
- New Guardrail for column sizes added (CASSANDRA-17151)
- A fix to add an additional check that a node being replaced is reported as 
live so we don't fail incorrectly (CASSANDRA-17805)
- Added the ability to do a one-time heap dump to a file on an Unhandled 
Exception, configurable by file or JMX (CASSANDRA-17795)
- Added a separate thread pool that handles high cost auth responses so new 
client connections don't overwhelm bcrypt (CASSANDRA-17812)
- A pretty significant improvement in DataOutputBuffer's memory usage and GC 
pressure (CASSANDRA-16471)
- The ability to read TTL and WRITE TIME of an element in a collection added 
(CASSANDRA-8877)
- Some cleaning up of python linting and legacy code fragments 
(CASSANDRA-17694, CASSANDRA-17779)
- A bug causing an NPE during streaming fixed (CASSANDRA-17801)
- Upstream CEP-15 work, using a seeded crc for PaxosBallotTracker checksum 
(CASSANDRA-17793)
- UUID for tracking nodeool import logging (CASSANDRA-17800)
- lack of JNA not exploding things when running as a client (CASSANDRA-17794)
- Skipping node kill on startup check for unknown things in system since that 
can happen if you upgrade from older versions of C* up (CASSANDRA-17777)
- Logging duplicate keys if they show up during verify (CASSANDRA-17789)
- Breaking out secondary index building to its own thread pool so it doesn't 
block compaction in (CASSANDRA-17781)

So to sum it up:
- CASTest continues to be the biggest block on 4.1: 
https://issues.apache.org/jira/browse/CASSANDRA-17461 but folks are working on 
it
- The lack of a consistent build lead means our Kanban tracking test fixes is 
drifting further from the state of CI
- It's pretty expensive and painful to defer cleaning up CI to the end of the 
release cycle

Keep fighting the good fight!


~Josh

Reply via email to