Josh McKenzie created CASSANDRA-20918:
-----------------------------------------
Summary: Add cursor-based low allocation optimized compaction
implementation
Key: CASSANDRA-20918
URL: https://issues.apache.org/jira/browse/CASSANDRA-20918
Project: Apache Cassandra
Issue Type: New Feature
Components: Local/Compaction, Local/SSTable
Reporter: Josh McKenzie
Assignee: Nitsan Wakart
Attachments: 7_100m_100kr_100r.png
Compaction does a ton of allocation and burns a lot of CPU in the process; we
can move away from allocation with some fairly simple and straightforward
reusable objects and infrastructure that make use of that, reducing allocation
and thus CPU usage during compaction. Heap allocation on all test-cases holds
steady at 20MB while regular compaction grows up past 5+GB.
This patch introduces a collection of reusable objects:
* ReusableLivenessInfo
* ReusableDecoratedKey
* ReusableLongToken
And new compaction structures that make use of those objects:
* CompactionCursor
* CursorCompactionPipeline
* SSTableCursorReader
* SSTableCursorWriter
There's quite a bit of test code added, benchmarks, etc on the linked branch.
~13k added, 405 lines deleted
~8.3k lines delta are non-test code
~5k lines delta are test code
Attaching a screenshot of the "messiest" benchmark case with mixed size rows
and full merge; across various data and compaction mixes the highlight is that
compaction as implemented here is roughly 3-5x faster in most scenarios and
uses 20mb on heap vs. multiple GB.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]