Re: Merging compaction improvements to 5.0

2025-04-19 Thread Jordan West
I have merged 2 commits to 5.0 to backport CASSANDRA-20092 and CASSANDRA-20396 as discussed in my previous email. Links are in the JIRAs as well but to make it easier they are here as well: https://github.com/apache/cassandra/commit/f327b63db09a907206749a3c88aba38a4554e548 https://github.com/apach

Re: Merging compaction improvements to 5.0

2025-04-15 Thread Chris Lohfink
+1 On Sun, Apr 13, 2025 at 12:32 PM Jordan West wrote: > Hi Folks, > > A bit delayed but I have the backport for 20092 ready. The branch can be > found here: > https://github.com/apache/cassandra/compare/cassandra-5.0...jrwest:cassandra:jwest/20092-5.0-backport. > I've run tests and all looked g

Re: Merging compaction improvements to 5.0

2025-04-13 Thread Jordan West
Hi Folks, A bit delayed but I have the backport for 20092 ready. The branch can be found here: https://github.com/apache/cassandra/compare/cassandra-5.0...jrwest:cassandra:jwest/20092-5.0-backport. I've run tests and all looked good. I plan to do one more run post a recent rebase. Links are in CAS

Re: Merging compaction improvements to 5.0

2025-02-19 Thread Ariel Weisberg
Whoops, 5 months ago, not six months ago. Much more reasonable to be making this kind of fix. On Wed, Feb 19, 2025, at 3:56 PM, Ariel Weisberg wrote: > Hi, > > This does not constitute a review, but I looked at both of them to convince > myself how they go about solving their respective problem

Re: Merging compaction improvements to 5.0

2025-02-19 Thread Ariel Weisberg
Hi, This does not constitute a review, but I looked at both of them to convince myself how they go about solving their respective problems is a good idea. I am weakly +1. The risk reward is there, but 13 months since 5.0 was released feels a little late to trying to improve node density instead

Re: Merging compaction improvements to 5.0

2025-02-14 Thread Jordan West
Thanks for the write up Mick. I think its is a great evaluation of 15452. A few notes below: * CI links for 15452 might be burried and I may need to link the most recent run (I’ve been using CircleCI since it’s what I’m familiar with — happy to have runs on ASf hardware as well). * 15452 is confi

Re: Merging compaction improvements to 5.0

2025-02-14 Thread Mick Semb Wever
Solid write up Jon! Hoping the committers and PMC members are keeping in mind this (very) recent thread: https://lists.apache.org/thread/h38g6q9d8h1q92h6qzs5tqdxpn2vmnyy This thread needs to also be about evaluating the risk these commits are to a patch version. I'm +1 and here's my thinking over

Re: Merging compaction improvements to 5.0

2025-02-14 Thread Josh McKenzie
> If folks want to point to the docs for each cloud provider for the maximum > block size per IO request, we can certainly document that somewhere. Meh, that will probably change on their side over time right? At most I'd say we link to their docs, but even then those external links will go stale

Re: Merging compaction improvements to 5.0

2025-02-13 Thread Paulo Motta
> My personal take is that we’ve tested this on a variety of hardware (from laptops to large instance sizes) already, as well as a few different disk configs (it’s also been run internally, in test, at a few places) and that it has been reviewed by four committers and another contributor. Thanks f

Re: Merging compaction improvements to 5.0

2025-02-13 Thread Jon Haddad
Yeah, this is how I feel too. This is different from CASSANDRA-19488 in that there aren't any cloud provider specific details that we need to account for with our patch. We're doing normal IO here. The same code works everywhere. The results will vary based on disk latency and quotas, but imo, f

Re: Merging compaction improvements to 5.0

2025-02-13 Thread Abe Ratnofsky
Another +1 (nb) in favor of merging to 5.0. This patch has been thoroughly tested and reviewed, and will likely be a strong reason for users to upgrade.

Re: Merging compaction improvements to 5.0

2025-02-13 Thread Doug Rohrer
+1 - Thanks for doing the work to figure this out and find a good fix. Doug > On Feb 13, 2025, at 11:28 AM, Patrick McFadin wrote: > > I’ve been following this for a while and I think it’s just some solid > engineering based on real-world challenges. Probably one of the best types of > contri

Re: Merging compaction improvements to 5.0

2025-02-13 Thread Patrick McFadin
I’ve been following this for a while and I think it’s just some solid engineering based on real-world challenges. Probably one of the best types of contributions to have. I’m +1 on adding it to 5 Patrick On Thu, Feb 13, 2025 at 7:31 AM Dmitry Konstantinov wrote: > +1 (nb) from my side, I raised

Re: Merging compaction improvements to 5.0

2025-02-13 Thread Dmitry Konstantinov
+1 (nb) from my side, I raised a few comments for CASSANDRA-15452 some time ago and Jordan addressed them. I have also backported CASSANDRA-15452 changes to my internal 4.1 fork and got about 15% reduction in compaction time even for a node with a local SSD. On Thu, 13 Feb 2025 at 13:22, Jordan We

Re: Merging compaction improvements to 5.0

2025-02-13 Thread Jordan West
For 15452 that’s correct (and I believe also for 20092). For 15452, the trunk and 5.0 patch are basically identical. Jordan On Thu, Feb 13, 2025 at 01:06 C. Scott Andreas wrote: > Checking to confirm the specific patches proposed for backport – is it the > trunk commit for C-20092 and the open

Re: Merging compaction improvements to 5.0

2025-02-12 Thread C. Scott Andreas
Checking to confirm the specific patches proposed for backport – is it the trunk commit for C-20092 and the open GitHub PR against the 5.0 branch for C-15452 linked below? CASSANDRA-20092: Introduce SSTableSimpleScanner for compaction (committed to trunk) https://github.com/apache/cassandra/commi

Re: Merging compaction improvements to 5.0

2025-02-12 Thread guo Maxwell
Of course, I definitely hope to see it merged into 5.0.x as soon as possible Jordan West 于2025年2月13日周四 10:48写道: > Regarding the buffer size, it is configurable. My personal take is that > we’ve tested this on a variety of hardware (from laptops to large instance > sizes) already, as well as a fe

Re: Merging compaction improvements to 5.0

2025-02-12 Thread Caleb Rackliffe
+1 on making this available in 5.0.xWe don’t have to find a default that’s perfect for every hardware configuration. I could understand an argument around disabling read ahead by default in 5.0, but that’s about it. No reason to withhold the capability from users.On Feb 12, 2025, at 9:36 PM, Tolber

Re: Merging compaction improvements to 5.0

2025-02-12 Thread Tolbert, Andy
The data captured in https://issues.apache.org/jira/browse/CASSANDRA-15452 is really exciting. I would also be interested in seeing these changes brought into 5.0. Thanks, Andy On Wed, Feb 12, 2025 at 8:49 PM Jordan West wrote: > Regarding the buffer size, it is configurable. My personal take

Re: Merging compaction improvements to 5.0

2025-02-12 Thread Jordan West
Regarding the buffer size, it is configurable. My personal take is that we’ve tested this on a variety of hardware (from laptops to large instance sizes) already, as well as a few different disk configs (it’s also been run internally, in test, at a few places) and that it has been reviewed by four

Re: Merging compaction improvements to 5.0

2025-02-12 Thread guo Maxwell
What I understand is that there will be some differences in block storage among various cloud platforms. More intuitively, the default read-ahead size will be the same. For example, AWS ebs seems to be 256K, and Alibaba Cloud seems to be 512K(If I remember correctly). Just like 19488, give the tes

Re: Merging compaction improvements to 5.0

2025-02-12 Thread Paulo Motta
Thanks Jon for the additional feedback. I will take a look at the ticket more closely and try to reproduce the claimed improvements on my laptop. If there's no regression in performance, I'm +1 in including this improvement in 5.0. On Wed, Feb 12, 2025 at 7:28 PM Jon Haddad wrote: > > Hey Paulo,

Re: Merging compaction improvements to 5.0

2025-02-12 Thread Jon Haddad
Can you elaborate why? This would be several hundred hours of work and would cost me thousands of $$ to perform. Filesystems and block devices are well understood. Could you give me an example of what you think might be different here? This is already one of the most well tested and documented

Re: Merging compaction improvements to 5.0

2025-02-12 Thread Jon Haddad
Hey Paulo, Great questions. I've tested the patch fairly extensively across a wide variety of AWS hardware types both EBS and not. I believe Dave Capwell tested it using infra he had available. In every case I've looked at, it's been a win, or on NVMe barely a change. The reason for this is tha

Re: Merging compaction improvements to 5.0

2025-02-12 Thread guo Maxwell
I think it should be tested on most cloud platforms(at least aws、azure、gcp) before merged into 5.0 . Just like CASSANDRA-19488. Paulo Motta 于2025年2月13日 周四上午6:10写道: > I'm looking forward to these improvements, compaction needs tlc. :-) > A couple of questions: > > Has this been tested only on EB

Re: Merging compaction improvements to 5.0

2025-02-12 Thread Paulo Motta
I'm looking forward to these improvements, compaction needs tlc. :-) A couple of questions: Has this been tested only on EBS, or also EC2/bare-metal/Azure/etc? My only concern is if this is an optimization for EBS that can be a deoptimization for other environments. Are there reproducible scripts