[jira] [Commented] (CASSANDRA-20918) Add cursor-based low allocation optimized compaction implementation

Benedict Elliott Smith (Jira) Wed, 03 Dec 2025 23:28:42 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042718#comment-18042718
 ]


Benedict Elliott Smith commented on CASSANDRA-20918:
----------------------------------------------------

Once Branimir completes his review and supplies his +1, I can be counted as the 
second committer +1. I am familiar with (and endorse) the general approach 
taken, and trust Nitsan and Branimir to do a good job on this together.

> Add cursor-based low allocation optimized compaction implementation
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-20918
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20918
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Local/Compaction, Local/SSTable
>            Reporter: Josh McKenzie
>            Assignee: Nitsan Wakart
>            Priority: Normal
>         Attachments: 7_100m_100kr_100r.png
>
>          Time Spent: 5h
>  Remaining Estimate: 0h
>
> Compaction does a ton of allocation and burns a lot of CPU in the process; we 
> can move away from allocation with some fairly simple and straightforward 
> reusable objects and infrastructure that make use of that, reducing 
> allocation and thus CPU usage during compaction. Heap allocation on all 
> test-cases holds steady at 20MB while regular compaction grows up past 5+GB.
> This patch introduces a collection of reusable objects:
>  * ReusableLivenessInfo
>  * ReusableDecoratedKey
>  * ReusableLongToken
> And new compaction structures that make use of those objects:
>  * CompactionCursor
>  * CursorCompactionPipeline
>  * SSTableCursorReader
>  * SSTableCursorWriter
> There's quite a bit of test code added, benchmarks, etc on the linked branch.
> ~13k added, 405 lines deleted
> ~8.3k lines delta are non-test code
> ~5k lines delta are test code
> Attaching a screenshot of the "messiest" benchmark case with mixed size rows 
> and full merge; across various data and compaction mixes the highlight is 
> that compaction as implemented here is roughly 3-5x faster in most scenarios 
> and uses 20mb on heap vs. multiple GB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-20918) Add cursor-based low allocation optimized compaction implementation

Reply via email to