[
https://issues.apache.org/jira/browse/CASSANDRA-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Konstantinov updated CASSANDRA-20173:
--------------------------------------------
Description:
Currently for memtable_allocation_type: offheap_objects when we flush a
memtable we allocate new ByteBuffer object for each NativeCell and
NativeClustering to write it to disk. It is one of the main contributors
(together with CASSANDRA-20162) to memory allocation for MemtableFlushWriter
thread:
!image-2024-12-29-13-06-13-115.png|width=570!
!native_clustering_byte_buffer_alloc.png|width=570!
Instead of retrieving of the value() as ByteBuffer we can introduce
NativeValueAccessor to not expose ByteBuffer + keep it inside NativeCell as a
thread-local to re-use. Same idea is applicable for NativeClustering
Note: for Cassandra 4.x the situation is even because we clone very BTreeRow
from off-heap to heap due to using the same iterator as for usual reads (where
we need to protect against memtable lifecycle changes). In case of flushing
such protection is not needed. For TrieMemtable we skip such cloning
(CASSANDRA-17240,
[https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/db/memtable/TrieMemtable.java#L393]).
It makes sense to consider such skipping for over types of memtable too..
was:
Currently for memtable_allocation_type: offheap_objects when we flush a
memtable we allocate new ByteBuffer object for each NativeCell and
NativeClustering to write it to disk. It is one of the main contributors
(together with CASSANDRA-20162) to memory allocation for MemtableFlushWriter
thread:
!image-2024-12-29-13-06-13-115.png|width=570!
!native_clustering_byte_buffer_alloc.png|width=570!
Instead of retrieving of the value() as ByteBuffer we can introduce
NativeValueAccessor to not expose ByteBuffer + keep it inside NativeCell as a
thread-local to re-use. Same idea is applicable for NativeClustering
Note: for Cassandra 4.x the situation is even because we clone very BTreeRow
from off-heap to heap due to using the same iterator as for usual reads (where
we need to protect against memtable lifecycle changes). In case of flushing
such protection is not needed. For TrieMemtable we skip such cloning
(CASSANDRA-17240,
[https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/db/memtable/TrieMemtable.java#L393]).
It makes sense to consider such skipping for over types of memtable too..
> Avoid new ByteBuffer allocation for each NativeCell/NativeClustering during
> flushing of offheap_objects memtable
> ----------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-20173
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20173
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Local/Memtable
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Fix For: 5.0.x
>
> Attachments: image-2024-12-29-13-06-13-115.png,
> native_clustering_byte_buffer_alloc.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently for memtable_allocation_type: offheap_objects when we flush a
> memtable we allocate new ByteBuffer object for each NativeCell and
> NativeClustering to write it to disk. It is one of the main contributors
> (together with CASSANDRA-20162) to memory allocation for MemtableFlushWriter
> thread:
> !image-2024-12-29-13-06-13-115.png|width=570!
> !native_clustering_byte_buffer_alloc.png|width=570!
> Instead of retrieving of the value() as ByteBuffer we can introduce
> NativeValueAccessor to not expose ByteBuffer + keep it inside NativeCell as a
> thread-local to re-use. Same idea is applicable for NativeClustering
> Note: for Cassandra 4.x the situation is even because we clone very BTreeRow
> from off-heap to heap due to using the same iterator as for usual reads
> (where we need to protect against memtable lifecycle changes). In case of
> flushing such protection is not needed. For TrieMemtable we skip such cloning
> (CASSANDRA-17240,
> [https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/db/memtable/TrieMemtable.java#L393]).
> It makes sense to consider such skipping for over types of memtable too..
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]