Re: Merging compaction improvements to 5.0

Jordan West Thu, 13 Feb 2025 05:11:34 -0800

For 15452 that’s correct (and I believe also for 20092). For 15452, the
trunk and 5.0 patch are basically identical.


Jordan

On Thu, Feb 13, 2025 at 01:06 C. Scott Andreas <[email protected]> wrote:

> Checking to confirm the specific patches proposed for backport – is it the
> trunk commit for C-20092 and the open GitHub PR against the 5.0 branch for
> C-15452 linked below?
>
> CASSANDRA-20092: Introduce SSTableSimpleScanner for compaction (committed
> to trunk)
> https://github.com/apache/cassandra/commit/3078aea1cfc70092a185bab8ac5dc8a35627330f
>
>  CASSANDRA-15452: Improve disk access patterns during compaction and range
> reads (PR available) https://github.com/apache/cassandra/pull/3606
>
> Thanks,
>
> – Scott
>
> On Feb 12, 2025, at 9:45 PM, guo Maxwell <[email protected]> wrote:
>
>
> Of course, I definitely hope to see it merged into 5.0.x as soon as
> possible
>
> Jordan West <[email protected]> 于2025年2月13日周四 10:48写道：
>
>> Regarding the buffer size, it is configurable. My personal take is that
>> we’ve tested this on a variety of hardware (from laptops to large instance
>> sizes) already, as well as a few different disk configs (it’s also been run
>> internally, in test, at a few places) and that it has been reviewed by four
>> committers and another contributor. Always love to see more numbers. if
>> folks want to take it for a spin on Alibaba cloud, azure, etc and determine
>> the best buffer size that’s awesome. We could document which is suggested
>> for the community. I don’t think it’s necessary to block on that however.
>>
>> Also I am of course +1 to including this in 5.0.
>>
>> Jordan
>>
>> On Wed, Feb 12, 2025 at 19:50 guo Maxwell <[email protected]> wrote:
>>
>>> What I understand is that there will be some differences in block
>>> storage among various cloud platforms. More intuitively, the default
>>> read-ahead size will be the same. For example, AWS ebs seems to be 256K,
>>> and Alibaba Cloud seems to be 512K（If I remember correctly).
>>>
>>> Just like 19488, give the test method, see who can assist in the test ,
>>> and provide the results.
>>>
>>> Jon Haddad <[email protected]> 于2025年2月13日周四 08:30写道：
>>>
>>>> Can you elaborate why?  This would be several hundred hours of work and
>>>> would cost me thousands of $$ to perform.
>>>>
>>>> Filesystems and block devices are well understood.  Could you give me
>>>> an example of what you think might be different here?  This is already one
>>>> of the most well tested and documented performance patches ever contributed
>>>> to the project.
>>>>
>>>> On Wed, Feb 12, 2025 at 4:26 PM guo Maxwell <[email protected]>
>>>> wrote:
>>>>
>>>>>  I think it should be tested on most cloud platforms（at least
>>>>> aws、azure、gcp） before merged into 5.0 . Just like  CASSANDRA-19488.
>>>>>
>>>>> Paulo Motta <[email protected]>于2025年2月13日 周四上午6:10写道：
>>>>>
>>>>>> I'm looking forward to these improvements, compaction needs tlc. :-)
>>>>>> A couple of questions:
>>>>>>
>>>>>> Has this been tested only on EBS, or also EC2/bare-metal/Azure/etc? My
>>>>>> only concern is if this is an optimization for EBS that can be a
>>>>>> deoptimization for other environments.
>>>>>>
>>>>>> Are there reproducible scripts that anyone can run to verify the
>>>>>> improvements in their own environments ? This could help alleviate any
>>>>>> concerns and gain confidence to introduce a perf. improvement in a
>>>>>> patch release.
>>>>>>
>>>>>> I have not read the ticket in detail, so apologies if this was already
>>>>>> discussed there or elsewhere.
>>>>>>
>>>>>> On Wed, Feb 12, 2025 at 3:01 PM Jon Haddad <[email protected]>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hey folks,
>>>>>> >
>>>>>> > Over the last 9 months Jordan and I have worked on CASSANDRA-15452
>>>>>> [1].  The TL;DR is that we're internalizing a read ahead buffer to allow 
>>>>>> us
>>>>>> to do fewer requests to disk during compaction and range reads.  This
>>>>>> results in far fewer system calls (roughly 16x reduction) and on systems
>>>>>> with higher read latency, a significant improvement in compaction
>>>>>> throughput.  We've tested several different EBS configurations and found 
>>>>>> it
>>>>>> delivers up to a 10x improvement when read ahead is optimized to minimize
>>>>>> read latency.  I worked with AWS and the EBS team directly on this and 
>>>>>> the
>>>>>> Best Practices for C* on EBS [2] I wrote for them.  I've performance 
>>>>>> tested
>>>>>> this patch extensively with hundreds of billions of operations across
>>>>>> several clusters and thousands of compactions.  It has less of an impact 
>>>>>> on
>>>>>> local NVMe, since the p99 latency is already 10-30x less than what you 
>>>>>> see
>>>>>> on EBS (100micros vs 1-3ms), and you can do hundreds of thousands of IOPS
>>>>>> vs a max of 16K.
>>>>>> >
>>>>>> > Related to this, Branimir wrote CASSANDRA-20092 [3], which
>>>>>> significantly improves compaction by avoiding reading the partition 
>>>>>> index.
>>>>>> CASSANDRA-20092 has been merged to trunk already [4].
>>>>>> >
>>>>>> > I think we should merge both of these patches into 5.0, as the perf
>>>>>> improvement should allow teams to increase density of EBS backed C*
>>>>>> clusters by 2-5x, driving cost way down.  There's a lot of teams running 
>>>>>> C*
>>>>>> on EBS now.  I'm currently working with one that's bottlenecked on maxed
>>>>>> out EBS GP3 storage.  I propose we merge both, because without
>>>>>> CASSANDRA-20092, we won't get the performance improvements in
>>>>>> CASSANDRA-15452 with BTI, only BIG format.  I've tested BTI in other
>>>>>> situations and found it to be far more performant than BIG.
>>>>>> >
>>>>>> > If we were looking at a small win, I wouldn't care much, but since
>>>>>> these patches, combined with UCS, allows more teams to run C* on EBS at >
>>>>>> 10TB / node, I think it's worth doing now.
>>>>>> >
>>>>>> > Thanks in advance,
>>>>>> > Jon
>>>>>> >
>>>>>> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15452
>>>>>> > [2]
>>>>>> https://aws.amazon.com/blogs/database/best-practices-for-running-apache-cassandra-with-amazon-ebs/
>>>>>> > [3] https://issues.apache.org/jira/browse/CASSANDRA-20092
>>>>>> > [4]
>>>>>> https://github.com/apache/cassandra/commit/3078aea1cfc70092a185bab8ac5dc8a35627330f
>>>>>> >
>>>>>>
>>>>>
>

Re: Merging compaction improvements to 5.0

Reply via email to