[jira] [Comment Edited] (CASSANDRA-20918) Add cursor-based low allocation optimized compaction implementation

Dmitry Konstantinov (Jira) Thu, 04 Dec 2025 09:39:42 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042864#comment-18042864
 ]


Dmitry Konstantinov edited comment on CASSANDRA-20918 at 12/4/25 5:38 PM:
--------------------------------------------------------------------------

A summary of limitations for the current implementation ("not supported" means 
the original iterator implementation is used):
* complex columns are not supported
* tables with secondary indexes are not supported
* BIG SSTable format is supported only (BTI is not supported)
* counters are not supported
* Murmur3Partitioner and LocalPartitioner are supported only
* nodetool garbagecollect is not supported


was (Author: dnk):
A summary of limitations for the current implementation ("not supported" means 
the original iterator implementation is used):
* complex columns are not supported
* counters are not supported
* Murmur3Partitioner and LocalPartitioner are supported only
* Tables with secondary indexes are not supported
* BIG SSTable format supported only (BTI is not supported)
* nodetool garbagecollect is not supported

> Add cursor-based low allocation optimized compaction implementation
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-20918
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20918
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Local/Compaction, Local/SSTable
>            Reporter: Josh McKenzie
>            Assignee: Nitsan Wakart
>            Priority: Normal
>         Attachments: 7_100m_100kr_100r.png
>
>          Time Spent: 5h
>  Remaining Estimate: 0h
>
> Compaction does a ton of allocation and burns a lot of CPU in the process; we 
> can move away from allocation with some fairly simple and straightforward 
> reusable objects and infrastructure that make use of that, reducing 
> allocation and thus CPU usage during compaction. Heap allocation on all 
> test-cases holds steady at 20MB while regular compaction grows up past 5+GB.
> This patch introduces a collection of reusable objects:
>  * ReusableLivenessInfo
>  * ReusableDecoratedKey
>  * ReusableLongToken
> And new compaction structures that make use of those objects:
>  * CompactionCursor
>  * CursorCompactionPipeline
>  * SSTableCursorReader
>  * SSTableCursorWriter
> There's quite a bit of test code added, benchmarks, etc on the linked branch.
> ~13k added, 405 lines deleted
> ~8.3k lines delta are non-test code
> ~5k lines delta are test code
> Attaching a screenshot of the "messiest" benchmark case with mixed size rows 
> and full merge; across various data and compaction mixes the highlight is 
> that compaction as implemented here is roughly 3-5x faster in most scenarios 
> and uses 20mb on heap vs. multiple GB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20918) Add cursor-based low allocation optimized compaction implementation

Reply via email to