I'm looking forward to these improvements, compaction needs tlc. :-)
A couple of questions:

Has this been tested only on EBS, or also EC2/bare-metal/Azure/etc? My
only concern is if this is an optimization for EBS that can be a
deoptimization for other environments.

Are there reproducible scripts that anyone can run to verify the
improvements in their own environments ? This could help alleviate any
concerns and gain confidence to introduce a perf. improvement in a
patch release.

I have not read the ticket in detail, so apologies if this was already
discussed there or elsewhere.

On Wed, Feb 12, 2025 at 3:01 PM Jon Haddad <j...@rustyrazorblade.com> wrote:
>
> Hey folks,
>
> Over the last 9 months Jordan and I have worked on CASSANDRA-15452 [1].  The 
> TL;DR is that we're internalizing a read ahead buffer to allow us to do fewer 
> requests to disk during compaction and range reads.  This results in far 
> fewer system calls (roughly 16x reduction) and on systems with higher read 
> latency, a significant improvement in compaction throughput.  We've tested 
> several different EBS configurations and found it delivers up to a 10x 
> improvement when read ahead is optimized to minimize read latency.  I worked 
> with AWS and the EBS team directly on this and the Best Practices for C* on 
> EBS [2] I wrote for them.  I've performance tested this patch extensively 
> with hundreds of billions of operations across several clusters and thousands 
> of compactions.  It has less of an impact on local NVMe, since the p99 
> latency is already 10-30x less than what you see on EBS (100micros vs 1-3ms), 
> and you can do hundreds of thousands of IOPS vs a max of 16K.
>
> Related to this, Branimir wrote CASSANDRA-20092 [3], which significantly 
> improves compaction by avoiding reading the partition index.  CASSANDRA-20092 
> has been merged to trunk already [4].
>
> I think we should merge both of these patches into 5.0, as the perf 
> improvement should allow teams to increase density of EBS backed C* clusters 
> by 2-5x, driving cost way down.  There's a lot of teams running C* on EBS 
> now.  I'm currently working with one that's bottlenecked on maxed out EBS GP3 
> storage.  I propose we merge both, because without CASSANDRA-20092, we won't 
> get the performance improvements in CASSANDRA-15452 with BTI, only BIG 
> format.  I've tested BTI in other situations and found it to be far more 
> performant than BIG.
>
> If we were looking at a small win, I wouldn't care much, but since these 
> patches, combined with UCS, allows more teams to run C* on EBS at > 10TB / 
> node, I think it's worth doing now.
>
> Thanks in advance,
> Jon
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-15452
> [2] 
> https://aws.amazon.com/blogs/database/best-practices-for-running-apache-cassandra-with-amazon-ebs/
> [3] https://issues.apache.org/jira/browse/CASSANDRA-20092
> [4] 
> https://github.com/apache/cassandra/commit/3078aea1cfc70092a185bab8ac5dc8a35627330f
>

Reply via email to