[
https://issues.apache.org/jira/browse/CASSANDRA-20918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042718#comment-18042718
]
Benedict Elliott Smith commented on CASSANDRA-20918:
----------------------------------------------------
Once Branimir completes his review and supplies his +1, I can be counted as the
second committer +1. I am familiar with (and endorse) the general approach
taken, and trust Nitsan and Branimir to do a good job on this together.
> Add cursor-based low allocation optimized compaction implementation
> -------------------------------------------------------------------
>
> Key: CASSANDRA-20918
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20918
> Project: Apache Cassandra
> Issue Type: New Feature
> Components: Local/Compaction, Local/SSTable
> Reporter: Josh McKenzie
> Assignee: Nitsan Wakart
> Priority: Normal
> Attachments: 7_100m_100kr_100r.png
>
> Time Spent: 5h
> Remaining Estimate: 0h
>
> Compaction does a ton of allocation and burns a lot of CPU in the process; we
> can move away from allocation with some fairly simple and straightforward
> reusable objects and infrastructure that make use of that, reducing
> allocation and thus CPU usage during compaction. Heap allocation on all
> test-cases holds steady at 20MB while regular compaction grows up past 5+GB.
> This patch introduces a collection of reusable objects:
> * ReusableLivenessInfo
> * ReusableDecoratedKey
> * ReusableLongToken
> And new compaction structures that make use of those objects:
> * CompactionCursor
> * CursorCompactionPipeline
> * SSTableCursorReader
> * SSTableCursorWriter
> There's quite a bit of test code added, benchmarks, etc on the linked branch.
> ~13k added, 405 lines deleted
> ~8.3k lines delta are non-test code
> ~5k lines delta are test code
> Attaching a screenshot of the "messiest" benchmark case with mixed size rows
> and full merge; across various data and compaction mixes the highlight is
> that compaction as implemented here is roughly 3-5x faster in most scenarios
> and uses 20mb on heap vs. multiple GB.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]