I have merged 2 commits to 5.0 to backport CASSANDRA-20092 and
CASSANDRA-20396 as discussed in my previous email.

Links are in the JIRAs as well but to make it easier they are here as well:
https://github.com/apache/cassandra/commit/f327b63db09a907206749a3c88aba38a4554e548
https://github.com/apache/cassandra/commit/17cb89208c804680ffd4445d6a826171a67edb79

Jordan

On Mon, Apr 14, 2025 at 5:00 AM Chris Lohfink <clohfin...@gmail.com> wrote:

> +1
>
> On Sun, Apr 13, 2025 at 12:32 PM Jordan West <jw...@apache.org> wrote:
>
>> Hi Folks,
>>
>> A bit delayed but I have the backport for 20092 ready. The branch can be
>> found here:
>> https://github.com/apache/cassandra/compare/cassandra-5.0...jrwest:cassandra:jwest/20092-5.0-backport.
>> I've run tests and all looked good. I plan to do one more run post a recent
>> rebase. Links are in CASSANDRA-20092. Before merging I wanted to sunshine
>> and get community input on the following:
>>
>> * I've also included CASSANDRA-20396 in the backport. This looks to be a
>> fix on top of CASSANDRA-20092 that we would want in the backport. I plan to
>> leave these as two separate commits on cassandra-5.0, not one merged
>> commit. If you feel differently please let me know.
>> * CHANGES.txt: i've included both patches in 5.0.5 section of
>> CHANGES.txt. They also appear under 5.1 in trunk. Since trunk isn't
>> released I don't see any issue here but if folks have strong opinions on
>> this lmk. I don't and can make any requested changes.
>>
>> * Along the way I discovered CASSANDRA-20538, a minor 1-line change
>> needed to actually take advantage of 20092+15452 when BTI is enabled. I
>> have included the patch for this in the backport as its largely a ninja
>> fix. If folks feel strongly about addressing both 5.0 and trunk with 20538
>> instead of one coming via backport and one coming via the JIRA I can remove
>> that commit from my backport (I am also not sure the best way to commit
>> this in the backport, so I sort of lean to immediately addressing it in
>> 20538. its largely moot since both would be included in the release either
>> way).
>>
>> * For the backport I had to adjust one test (
>> https://github.com/apache/cassandra/commit/0039ebf915b66a88234325a74ef6edd18bde6da0).
>> From what I can tell this is just because the compression code is less
>> likely to find an aligned key on 5.0 (I haven't dug too much into why now
>> that the test reliably passes with this minor tweak). Plan to squash this
>> into the backport commit for CASSANDRA-20092 unless folks object.
>>
>> Jordan
>>
>> On Wed, Feb 19, 2025 at 1:35 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
>>
>>> Whoops, 5 months ago, not six months ago. Much more reasonable to be
>>> making this kind of fix.
>>>
>>> On Wed, Feb 19, 2025, at 3:56 PM, Ariel Weisberg wrote:
>>>
>>> Hi,
>>>
>>> This does not constitute a review, but I looked at both of them to
>>> convince myself how they go about solving their respective problems is a
>>> good idea. I am weakly +1. The risk reward is there, but 13 months since
>>> 5.0 was released feels a little late to trying to improve node density
>>> instead of just saying it is what it is.
>>>
>>> The +1 stems from the fact that if you really operate this at scale and
>>> you are committed to node density then it's less of a nice to have and more
>>> of a "some cluster falls over periodically now".
>>>
>>> Ariel
>>>
>>> On Fri, Feb 14, 2025, at 5:23 PM, Jordan West wrote:
>>>
>>> Thanks for the write up Mick. I think its is a great evaluation of
>>> 15452. A few notes below:
>>>
>>> * CI links for 15452 might be burried and I may need to link the most
>>> recent run (I’ve been using CircleCI since it’s what I’m familiar with —
>>> happy to have runs on ASf hardware as well).
>>>
>>> * 15452 is configurable by YAML property as you mentioned. I’m traveling
>>> but I’m 99% sure I made it a hot property as well (personally I believe you
>>> shouldn’t have to restart a node to disable something like this, so if I
>>> forgot to make it hot I will before merge). I’m traveling without my laptop
>>> so I will double check next week (if someone has time to before then it’s
>>> much appreciated)
>>>
>>>  * I think “safe-ish” is a fair description of 15452. The new code is
>>> fairly isolated but it’s in a very critical path with several call paths
>>> leading to it. I think the testing we’ve done (both manual and automated)
>>> as well as finding and fixing several bugs with those tests is what gives
>>> me the confidence. And as you mentioned we have the fail safe of turning it
>>> off otherwise.
>>>
>>> Jordan
>>>
>>> On Fri, Feb 14, 2025 at 12:42 Mick Semb Wever <m...@apache.org> wrote:
>>>
>>> Solid write up Jon!
>>>
>>> Hoping the committers and PMC members are keeping in mind this (very)
>>> recent thread:
>>> https://lists.apache.org/thread/h38g6q9d8h1q92h6qzs5tqdxpn2vmnyy
>>>
>>> This thread needs to also be about evaluating the risk these commits
>>> are to a patch version.
>>> I'm +1 and here's my thinking over it…
>>>
>>> Are the performance benefits clear ?
>>> Yes (thank you Jon for doing a solid job at demonstrating just how
>>> awesome this will be).
>>>
>>> Do we want this in 5.0 ?
>>> That's the dumb question :) of course.
>>>
>>> Does it fix a bug or a regression, or is an operational improvement ?
>>> Kinda, Branimir did mention a performance regression when the chunk
>>> cache was disabled.
>>>
>>> How are the patches with unforeseen risks ?
>>> These patches are medium sized and low-level, though isolated to (and
>>> creating a better isolation of) the compaction (scan) reader.  15452
>>> looks safe-ish.  I can't speak to 20092, it would be nice to hear
>>> Branimir and Sylvain chime in.  Are we confident that if any bugs do
>>> arise in user's 5.0 production deployments we will be quick to provide
>>> fixes and releases?  I was thinking it's worth putting in a system
>>> property that can disable the scan reader, so if anything happens
>>> folks can just restart the node with the system property to return to
>>> pre-patch behaviour, but that's already there! :)
>>>
>>> How well tested have they been ?
>>> We have one (or a few) reports of it running in production, that's
>>> super positive.  Reports of lots of manual testing, super positive
>>> too.  But there's no CI results available for either patch.  (Even the
>>> one committed to trunk doesn't have pre-commit CI results available.)
>>> I think CI results attached to the ticket are a must – I'm working on
>>> adding them.
>>>
>>>
>>>
>>>

Reply via email to