I think it should be tested on most cloud platforms(at least aws、azure、gcp) before merged into 5.0 . Just like CASSANDRA-19488.
Paulo Motta <pa...@apache.org>于2025年2月13日 周四上午6:10写道: > I'm looking forward to these improvements, compaction needs tlc. :-) > A couple of questions: > > Has this been tested only on EBS, or also EC2/bare-metal/Azure/etc? My > only concern is if this is an optimization for EBS that can be a > deoptimization for other environments. > > Are there reproducible scripts that anyone can run to verify the > improvements in their own environments ? This could help alleviate any > concerns and gain confidence to introduce a perf. improvement in a > patch release. > > I have not read the ticket in detail, so apologies if this was already > discussed there or elsewhere. > > On Wed, Feb 12, 2025 at 3:01 PM Jon Haddad <j...@rustyrazorblade.com> > wrote: > > > > Hey folks, > > > > Over the last 9 months Jordan and I have worked on CASSANDRA-15452 [1]. > The TL;DR is that we're internalizing a read ahead buffer to allow us to do > fewer requests to disk during compaction and range reads. This results in > far fewer system calls (roughly 16x reduction) and on systems with higher > read latency, a significant improvement in compaction throughput. We've > tested several different EBS configurations and found it delivers up to a > 10x improvement when read ahead is optimized to minimize read latency. I > worked with AWS and the EBS team directly on this and the Best Practices > for C* on EBS [2] I wrote for them. I've performance tested this patch > extensively with hundreds of billions of operations across several clusters > and thousands of compactions. It has less of an impact on local NVMe, > since the p99 latency is already 10-30x less than what you see on EBS > (100micros vs 1-3ms), and you can do hundreds of thousands of IOPS vs a max > of 16K. > > > > Related to this, Branimir wrote CASSANDRA-20092 [3], which significantly > improves compaction by avoiding reading the partition index. > CASSANDRA-20092 has been merged to trunk already [4]. > > > > I think we should merge both of these patches into 5.0, as the perf > improvement should allow teams to increase density of EBS backed C* > clusters by 2-5x, driving cost way down. There's a lot of teams running C* > on EBS now. I'm currently working with one that's bottlenecked on maxed > out EBS GP3 storage. I propose we merge both, because without > CASSANDRA-20092, we won't get the performance improvements in > CASSANDRA-15452 with BTI, only BIG format. I've tested BTI in other > situations and found it to be far more performant than BIG. > > > > If we were looking at a small win, I wouldn't care much, but since these > patches, combined with UCS, allows more teams to run C* on EBS at > 10TB / > node, I think it's worth doing now. > > > > Thanks in advance, > > Jon > > > > [1] https://issues.apache.org/jira/browse/CASSANDRA-15452 > > [2] > https://aws.amazon.com/blogs/database/best-practices-for-running-apache-cassandra-with-amazon-ebs/ > > [3] https://issues.apache.org/jira/browse/CASSANDRA-20092 > > [4] > https://github.com/apache/cassandra/commit/3078aea1cfc70092a185bab8ac5dc8a35627330f > > >