[jira] [Commented] (CASSANDRA-20918) Add cursor-based low allocation optimized compaction implementation

David Capwell (Jira) Fri, 12 Dec 2025 10:17:17 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044746#comment-18044746
 ]


David Capwell commented on CASSANDRA-20918:
-------------------------------------------

I ran a benchmark that isn't too compaction heavy and client latencies were not 
statistically impacted (we didn't expect them to be).  Looking at CPU profiles 
we did see previous would have 15% CPU doing compaction now 3.6% I think?

Don't have a good stress test for compaction so would rely on all the prior 
work, but our benchmarks look promising... need to get Nitsan access to a 
baseline GC logs but likely able to do that next week.

For the most part, looking at the benchmark I am +1, but I have not reviewed 
the code (I trust the others who actually are to do the review).

> Add cursor-based low allocation optimized compaction implementation
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-20918
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20918
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Local/Compaction, Local/SSTable
>            Reporter: Josh McKenzie
>            Assignee: Nitsan Wakart
>            Priority: Normal
>         Attachments: 7_100m_100kr_100r.png, compact_after_t1.zip, 
> compact_after_t1_profiles.zip, compact_before_t1.zip
>
>          Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Compaction does a ton of allocation and burns a lot of CPU in the process; we 
> can move away from allocation with some fairly simple and straightforward 
> reusable objects and infrastructure that make use of that, reducing 
> allocation and thus CPU usage during compaction. Heap allocation on all 
> test-cases holds steady at 20MB while regular compaction grows up past 5+GB.
> This patch introduces a collection of reusable objects:
>  * Reusable key/token implementations for Murmur and Local partitioners
>  * ReusableLivenessInfo
>  * ReusableDeleteTime
>  * Partition/Clustering/UnfilteredDescriptor
> And new compaction structures that make use of those objects:
>  * CursorCompactionPipeline
>  * CursorCompactor
>  * SSTableCursorReader
>  * SSTableCursorWriter
> There's quite a bit of test code added, benchmarks, etc on the linked branch.
> ~13k added, 405 lines deleted
> ~8.3k lines delta are non-test code
> ~5k lines delta are test code
> Attaching a screenshot of the "messiest" benchmark case with mixed size rows 
> and full merge; across various data and compaction mixes the highlight is 
> that compaction as implemented here is roughly 3-5x faster in most scenarios 
> and uses 20mb on heap vs. multiple GB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-20918) Add cursor-based low allocation optimized compaction implementation

Reply via email to