Re: Merging compaction improvements to 5.0

guo Maxwell Wed, 12 Feb 2025 21:45:28 -0800

Of course, I definitely hope to see it merged into 5.0.x as soon as possible


Jordan West <[email protected]> 于2025年2月13日周四 10:48写道：

> Regarding the buffer size, it is configurable. My personal take is that
> we’ve tested this on a variety of hardware (from laptops to large instance
> sizes) already, as well as a few different disk configs (it’s also been run
> internally, in test, at a few places) and that it has been reviewed by four
> committers and another contributor. Always love to see more numbers. if
> folks want to take it for a spin on Alibaba cloud, azure, etc and determine
> the best buffer size that’s awesome. We could document which is suggested
> for the community. I don’t think it’s necessary to block on that however.
>
> Also I am of course +1 to including this in 5.0.
>
> Jordan
>
> On Wed, Feb 12, 2025 at 19:50 guo Maxwell <[email protected]> wrote:
>
>> What I understand is that there will be some differences in block storage
>> among various cloud platforms. More intuitively, the default read-ahead
>> size will be the same. For example, AWS ebs seems to be 256K, and Alibaba
>> Cloud seems to be 512K（If I remember correctly).
>>
>> Just like 19488, give the test method, see who can assist in the test ,
>> and provide the results.
>>
>> Jon Haddad <[email protected]> 于2025年2月13日周四 08:30写道：
>>
>>> Can you elaborate why?  This would be several hundred hours of work and
>>> would cost me thousands of $$ to perform.
>>>
>>> Filesystems and block devices are well understood.  Could you give me an
>>> example of what you think might be different here?  This is already one of
>>> the most well tested and documented performance patches ever contributed to
>>> the project.
>>>
>>> On Wed, Feb 12, 2025 at 4:26 PM guo Maxwell <[email protected]>
>>> wrote:
>>>
>>>>  I think it should be tested on most cloud platforms（at least
>>>> aws、azure、gcp） before merged into 5.0 . Just like  CASSANDRA-19488.
>>>>
>>>> Paulo Motta <[email protected]>于2025年2月13日 周四上午6:10写道：
>>>>
>>>>> I'm looking forward to these improvements, compaction needs tlc. :-)
>>>>> A couple of questions:
>>>>>
>>>>> Has this been tested only on EBS, or also EC2/bare-metal/Azure/etc? My
>>>>> only concern is if this is an optimization for EBS that can be a
>>>>> deoptimization for other environments.
>>>>>
>>>>> Are there reproducible scripts that anyone can run to verify the
>>>>> improvements in their own environments ? This could help alleviate any
>>>>> concerns and gain confidence to introduce a perf. improvement in a
>>>>> patch release.
>>>>>
>>>>> I have not read the ticket in detail, so apologies if this was already
>>>>> discussed there or elsewhere.
>>>>>
>>>>> On Wed, Feb 12, 2025 at 3:01 PM Jon Haddad <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > Hey folks,
>>>>> >
>>>>> > Over the last 9 months Jordan and I have worked on CASSANDRA-15452
>>>>> [1].  The TL;DR is that we're internalizing a read ahead buffer to allow 
>>>>> us
>>>>> to do fewer requests to disk during compaction and range reads.  This
>>>>> results in far fewer system calls (roughly 16x reduction) and on systems
>>>>> with higher read latency, a significant improvement in compaction
>>>>> throughput.  We've tested several different EBS configurations and found 
>>>>> it
>>>>> delivers up to a 10x improvement when read ahead is optimized to minimize
>>>>> read latency.  I worked with AWS and the EBS team directly on this and the
>>>>> Best Practices for C* on EBS [2] I wrote for them.  I've performance 
>>>>> tested
>>>>> this patch extensively with hundreds of billions of operations across
>>>>> several clusters and thousands of compactions.  It has less of an impact 
>>>>> on
>>>>> local NVMe, since the p99 latency is already 10-30x less than what you see
>>>>> on EBS (100micros vs 1-3ms), and you can do hundreds of thousands of IOPS
>>>>> vs a max of 16K.
>>>>> >
>>>>> > Related to this, Branimir wrote CASSANDRA-20092 [3], which
>>>>> significantly improves compaction by avoiding reading the partition index.
>>>>> CASSANDRA-20092 has been merged to trunk already [4].
>>>>> >
>>>>> > I think we should merge both of these patches into 5.0, as the perf
>>>>> improvement should allow teams to increase density of EBS backed C*
>>>>> clusters by 2-5x, driving cost way down.  There's a lot of teams running 
>>>>> C*
>>>>> on EBS now.  I'm currently working with one that's bottlenecked on maxed
>>>>> out EBS GP3 storage.  I propose we merge both, because without
>>>>> CASSANDRA-20092, we won't get the performance improvements in
>>>>> CASSANDRA-15452 with BTI, only BIG format.  I've tested BTI in other
>>>>> situations and found it to be far more performant than BIG.
>>>>> >
>>>>> > If we were looking at a small win, I wouldn't care much, but since
>>>>> these patches, combined with UCS, allows more teams to run C* on EBS at >
>>>>> 10TB / node, I think it's worth doing now.
>>>>> >
>>>>> > Thanks in advance,
>>>>> > Jon
>>>>> >
>>>>> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15452
>>>>> > [2]
>>>>> https://aws.amazon.com/blogs/database/best-practices-for-running-apache-cassandra-with-amazon-ebs/
>>>>> > [3] https://issues.apache.org/jira/browse/CASSANDRA-20092
>>>>> > [4]
>>>>> https://github.com/apache/cassandra/commit/3078aea1cfc70092a185bab8ac5dc8a35627330f
>>>>> >
>>>>>
>>>>

Re: Merging compaction improvements to 5.0

Reply via email to