Re: Merging compaction improvements to 5.0

Jon Haddad Wed, 12 Feb 2025 16:28:38 -0800

Hey Paulo,

Great questions.  I've tested the patch fairly extensively across a wide
variety of AWS hardware types both EBS and not.  I believe Dave Capwell
tested it using infra he had available.


In every case I've looked at, it's been a win, or on NVMe barely a change.
The reason for this is that we're fetching data from a local byte array
instead of the page cache.  There's no circumstance where it can be faster
to get data out of page cache than sequentially fetching bytes out of a
byte array.

In the ticket I've provided extensive documentation showing how to repo
using easy-cass-lab and easy-cass-stress.  I've shown how to watch the
filesystem and block device for individual reads (xfsslower & biosnoop),
you can see each filesystem access, how many bytes were fetched, and how
long it took.  I've included what I think is a fairly comprehensive
analysis of the effects of the patch.  I accounted for differences in
instance types by switching the C* version from stock 5.0 to 15452 patched.

I've tried to use this as an opportunity to demonstrate what I think is the
level of detail that a patch like this should have, so I hope you get a
chance to take the time to check out the JIRA.  There's 100x more detail
there than I've provided in this email.

Jon


On Wed, Feb 12, 2025 at 2:10 PM Paulo Motta <[email protected]> wrote:

> I'm looking forward to these improvements, compaction needs tlc. :-)
> A couple of questions:
>
> Has this been tested only on EBS, or also EC2/bare-metal/Azure/etc? My
> only concern is if this is an optimization for EBS that can be a
> deoptimization for other environments.
>
> Are there reproducible scripts that anyone can run to verify the
> improvements in their own environments ? This could help alleviate any
> concerns and gain confidence to introduce a perf. improvement in a
> patch release.
>
> I have not read the ticket in detail, so apologies if this was already
> discussed there or elsewhere.
>
> On Wed, Feb 12, 2025 at 3:01 PM Jon Haddad <[email protected]>
> wrote:
> >
> > Hey folks,
> >
> > Over the last 9 months Jordan and I have worked on CASSANDRA-15452 [1].
> The TL;DR is that we're internalizing a read ahead buffer to allow us to do
> fewer requests to disk during compaction and range reads.  This results in
> far fewer system calls (roughly 16x reduction) and on systems with higher
> read latency, a significant improvement in compaction throughput.  We've
> tested several different EBS configurations and found it delivers up to a
> 10x improvement when read ahead is optimized to minimize read latency.  I
> worked with AWS and the EBS team directly on this and the Best Practices
> for C* on EBS [2] I wrote for them.  I've performance tested this patch
> extensively with hundreds of billions of operations across several clusters
> and thousands of compactions.  It has less of an impact on local NVMe,
> since the p99 latency is already 10-30x less than what you see on EBS
> (100micros vs 1-3ms), and you can do hundreds of thousands of IOPS vs a max
> of 16K.
> >
> > Related to this, Branimir wrote CASSANDRA-20092 [3], which significantly
> improves compaction by avoiding reading the partition index.
> CASSANDRA-20092 has been merged to trunk already [4].
> >
> > I think we should merge both of these patches into 5.0, as the perf
> improvement should allow teams to increase density of EBS backed C*
> clusters by 2-5x, driving cost way down.  There's a lot of teams running C*
> on EBS now.  I'm currently working with one that's bottlenecked on maxed
> out EBS GP3 storage.  I propose we merge both, because without
> CASSANDRA-20092, we won't get the performance improvements in
> CASSANDRA-15452 with BTI, only BIG format.  I've tested BTI in other
> situations and found it to be far more performant than BIG.
> >
> > If we were looking at a small win, I wouldn't care much, but since these
> patches, combined with UCS, allows more teams to run C* on EBS at > 10TB /
> node, I think it's worth doing now.
> >
> > Thanks in advance,
> > Jon
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-15452
> > [2]
> https://aws.amazon.com/blogs/database/best-practices-for-running-apache-cassandra-with-amazon-ebs/
> > [3] https://issues.apache.org/jira/browse/CASSANDRA-20092
> > [4]
> https://github.com/apache/cassandra/commit/3078aea1cfc70092a185bab8ac5dc8a35627330f
> >
>

Re: Merging compaction improvements to 5.0

Reply via email to