Josh McKenzie created CASSANDRA-20918:
-----------------------------------------

             Summary: Add cursor-based low allocation optimized compaction 
implementation
                 Key: CASSANDRA-20918
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20918
             Project: Apache Cassandra
          Issue Type: New Feature
          Components: Local/Compaction, Local/SSTable
            Reporter: Josh McKenzie
            Assignee: Nitsan Wakart
         Attachments: 7_100m_100kr_100r.png

Compaction does a ton of allocation and burns a lot of CPU in the process; we 
can move away from allocation with some fairly simple and straightforward 
reusable objects and infrastructure that make use of that, reducing allocation 
and thus CPU usage during compaction. Heap allocation on all test-cases holds 
steady at 20MB while regular compaction grows up past 5+GB.

This patch introduces a collection of reusable objects:
 * ReusableLivenessInfo
 * ReusableDecoratedKey
 * ReusableLongToken


And new compaction structures that make use of those objects:
 * CompactionCursor
 * CursorCompactionPipeline
 * SSTableCursorReader
 * SSTableCursorWriter

There's quite a bit of test code added, benchmarks, etc on the linked branch.

~13k added, 405 lines deleted
~8.3k lines delta are non-test code
~5k lines delta are test code

Attaching a screenshot of the "messiest" benchmark case with mixed size rows 
and full merge; across various data and compaction mixes the highlight is that 
compaction as implemented here is roughly 3-5x faster in most scenarios and 
uses 20mb on heap vs. multiple GB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to