One correction, testAutoSnapshotTTIOnDropAfterRestart - ticket in review
already, it wasn’t linked in Butler though. I will link it now. Thanks
Paulo and Caleb for looking into it.

On Wed, 17 Aug 2022 at 14:05, Josh McKenzie <jmcken...@apache.org> wrote:

> This update comes to you from day 5 of quarantining in the basement.
> Thanks Pandemic. (╯°□°)╯︵ ┻━┻
>
> (Today we're going to test if the ASF mailing lists allows a variety of
> ascii characters! I almost hope for everyone's sakes it doesn't; I abuse
> these things. :))
>
> Let's start with 4.1:
> Latest run has 7 failures. If we dig a bit deeper into the detail panel (
> https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1),
> you can see that the CASTest failures in
> https://issues.apache.org/jira/browse/CASSANDRA-17461 account for the
> long pole blocking the release. Looks like there's multiple folks working
> on that (thanks Brandon, Benedict, Andres, and Berenguer!), but it also
> looks like there's still no assignee so we're maybe holding it at arms
> length. Either that or we're just going to keep dogpiling on it which is
> great too; I don't see it falling off the radar any time soon.
>
> has failed a few times so there's some legit flake there:
> https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.distributed.test/AutoSnapshotTtlTest/testAutoSnapshotTTlOnDropAfterRestart_2/.
> No build lead lately so we don't have a JIRA for it or associated with it (
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=496&quickFilter=2252);
> I may put that mantle back on in the near future.
>
> There are 2 other failures that push us up to 7:
> 1)
> org.apache.cassandra.distributed.test.RepairTest.testForcedNormalRepairWithOneNodeDown
> (
> https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.distributed.test/RepairTest/testForcedNormalRepairWithOneNodeDown/).
> Looks like not all endpoints replied to the repair request so probably
> worth trying to repro locally and troubleshoot.
>
> 2) org.apache.cassandra.net.ProxyHandlerConnectionsTest.testExpireSome (
> https://ci-cassandra.apache.org/job/Cassandra-4.1/138/testReport/org.apache.cassandra.net/ProxyHandlerConnectionsTest/testExpireSome_2/).
> This is a timeout, so it's anyone's guess. :)
>
> Holistically, if we take a step back and look at 4.1 from a distance as to
> its general CI health, there's quite a bit of flake there:
> https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-4.1/cassandra-4.1.
> If we toss out build 122 as an anomaly, there's many that fail once in the
> past 16 runs. This continues to highlight for me the tradeoff of re-running
> flakes vs. blocking on them. We've chatted a bit on other ML threads on
> this; while prepping this email I just noticed that the high level
> dashboard on our branches shows varied and pervasive flakiness which is
> particularly challenging.
>
> Getting to 0 flakes with a "run once 0 tolerance" policy with the current
> ASF CI infra (which is definitely being improved upon!) looks to be
> something of a Sisyphean task.
>
> We're down from 13 tickets blocking 4.1 beta down to 7:
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2455.
> As mentioned above, we have some test failures w/out tickets so that 7 is
> probably closer realistically to the previous count.
>
> We have one unassigned ticket blocking 4.1 if anyone wants to pick it up:
> https://issues.apache.org/jira/browse/CASSANDRA-17773 (Incorrect
> cassandra.logdir on Debian systems).
>
>
> [New Contributors Getting Started]
> Follow your curiosity! We have a small number of things that still need to
> be fixed blocking 4.1, but if you have something specific you're interested
> in and there's an open ticket in jira on Cassandra, feel free to ping in
> slack (see below) to see if there's any context you need to dive in and get
> yourself assigned to that ticket. Hit up the @cassandra_mentors alias to
> reach volunteers who are available to help you get situated and link up
> with you as a mentor.
>
> To search JIRA for a topic of interest, replace "ReplaceTextHere" with the
> topic on the following JIRA search:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20cassandra%20AND%20resolution%20!%3D%20unresolved%20AND%20assignee%20is%20EMPTY%20AND%20summary%20~%20%27ReplaceTextHere%27%20ORDER%20BY%20priority%20ASC
>
> To get situated, here's an explanation of various types of contribution:
> https://cassandra.apache.org/_/community.html#how-to-contribute
> An overview of the C* architecture:
> https://cassandra.apache.org/doc/latest/cassandra/architecture/overview.html
> And here's our getting started contributing guide:
> https://cassandra.apache.org/_/development/index.html
> We hang out in #cassandra-dev on https://the-asf.slack.com so come join
> us.
>
>
> [Dev list Digest]
> https://lists.apache.org/list?dev@cassandra.apache.org:lte=2w:
>
> We've had a fairly active couple of weeks. Caleb is shopping for feedback
> on what we do with hints during decommission:
> https://lists.apache.org/thread/0o2kd2hntbdjhpf8t1j9l9ys7k7y1wo5. See
> CASSANDRA-17808 for more details:
> https://issues.apache.org/jira/browse/CASSANDRA-17808
>
> Claude brought up the state of our open pull requests (so many that are
> open and stale) and the optics and inclusivity of our current MO:
> https://lists.apache.org/thread/7r6wd2p8kyz0g7rw2mnlw411gdmymlld. I'll
> refrain from further editorializing here as I've shared my perspective on
> the proposal thread; thanks Claude for bringing that up!
>
> Claude later brought up a formal proposal to add a pull request template:
> https://lists.apache.org/thread/bwogjbpmwxd7qongq86lcv03ljqq83ps to the
> project.
>
> Mick put forward the proposal to move our official debian and redhat
> repositories from downloads.apache.org to a redirect to apache.jfrog.io:
> https://lists.apache.org/thread/09kj80xld5dkt7cv73m6xs56lqh4jd18
>
> In pursuit of CEP-15: Accord and multi-key transactions, Caleb is working
> on the syntax discussed in a previous thread.
> https://lists.apache.org/thread/6p2flc3ql14nkn76m3dp1cldmqx0kz96, see
> https://issues.apache.org/jira/browse/CASSANDRA-17719 as well for
> details. Patrick shared quite a few thoughts a few days ago; curious to see
> what others think.
>
>
> [CI Trends]
> https://butler.cassandra.apache.org/#/
>
> Here's our trends on our branches for the last two weeks:
>
> 3.0: 14 -> 11
> 3.11: 17 -> 20
> 4.0: 6 -> 5
> 4.1: 4 -> 7
> trunk: 7 -> 8
>
> Most of the consistent failures on 3.0 don't have assignees yet but do
> have JIRA tickets:
> https://butler.cassandra.apache.org/#/ci/upstream/compare/Cassandra-3.0/cassandra-3.0
> (Thanks Brandon for creating those JIRAs)
>
>
> [Release progress]
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=484&quickFilter=2175
>
> Going to try a new "editorialized changes.txt" style format here:
>
> 4.1 beta: 11 issues
> - Fixed documentation surrounding semantics of token ranges in nodetool
> compact (CASSANDRA-17575)
> - A variety of flaky test failures
> - build and packaging fixes (CASSANDRA-17766 and 17765)
> - Fixing clientInitialization setting the failure detector
> (CASSANDRA-17782)
> - Fixing BulkLoader initializing schema via streaming (CASSANDRA-17740)
>
> 4.X / Next: 17 issues
> - New Guardrail for column sizes added (CASSANDRA-17151)
> - A fix to add an additional check that a node being replaced is reported
> as live so we don't fail incorrectly (CASSANDRA-17805)
> - Added the ability to do a one-time heap dump to a file on an Unhandled
> Exception, configurable by file or JMX (CASSANDRA-17795)
> - Added a separate thread pool that handles high cost auth responses so
> new client connections don't overwhelm bcrypt (CASSANDRA-17812)
> - A pretty significant improvement in DataOutputBuffer's memory usage and
> GC pressure (CASSANDRA-16471)
> - The ability to read TTL and WRITE TIME of an element in a collection
> added (CASSANDRA-8877)
> - Some cleaning up of python linting and legacy code fragments
> (CASSANDRA-17694, CASSANDRA-17779)
> - A bug causing an NPE during streaming fixed (CASSANDRA-17801)
> - Upstream CEP-15 work, using a seeded crc for PaxosBallotTracker checksum
> (CASSANDRA-17793)
> - UUID for tracking nodeool import logging (CASSANDRA-17800)
> - lack of JNA not exploding things when running as a client
> (CASSANDRA-17794)
> - Skipping node kill on startup check for unknown things in system since
> that can happen if you upgrade from older versions of C* up
> (CASSANDRA-17777)
> - Logging duplicate keys if they show up during verify (CASSANDRA-17789)
> - Breaking out secondary index building to its own thread pool so it
> doesn't block compaction in (CASSANDRA-17781)
>
> So to sum it up:
> - CASTest continues to be the biggest block on 4.1:
> https://issues.apache.org/jira/browse/CASSANDRA-17461 but folks are
> working on it
> - The lack of a consistent build lead means our Kanban tracking test fixes
> is drifting further from the state of CI
> - It's pretty expensive and painful to defer cleaning up CI to the end of
> the release cycle
>
> Keep fighting the good fight!
>
>
>
> ~Josh
>

Reply via email to