[ 
https://issues.apache.org/jira/browse/CASSANDRA-20173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Konstantinov updated CASSANDRA-20173:
--------------------------------------------
    Description: 
Currently for memtable_allocation_type: offheap_objects when we flush a 
memtable we allocate new ByteBuffer object for each NativeCell and 
NativeClustering to write it to disk. It is one of the main contributors 
(together with CASSANDRA-20162) to memory allocation for MemtableFlushWriter 
thread:

!image-2024-12-29-13-06-13-115.png|width=570!

 !native_clustering_byte_buffer_alloc.png|width=570!

Instead of retrieving of the value() as ByteBuffer we can introduce 
NativeValueAccessor to not expose ByteBuffer + keep it inside NativeCell as a 
thread-local to re-use. Same idea is applicable for NativeClustering

Note: for Cassandra 4.x the situation is even because we clone very BTreeRow 
from off-heap to heap due to using the same iterator as for usual reads (where 
we need to protect against memtable lifecycle changes). In case of flushing 
such protection is not needed. For TrieMemtable we skip such cloning 
(CASSANDRA-17240, 
[https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/db/memtable/TrieMemtable.java#L393]).
 It makes sense to consider such skipping for over types of memtable too..

  was:
Currently for memtable_allocation_type: offheap_objects when we flush a 
memtable we allocate new ByteBuffer object for each NativeCell and 
NativeClustering to write it to disk. It is one of the main contributors 
(together with CASSANDRA-20162) to memory allocation for MemtableFlushWriter 
thread:

!image-2024-12-29-13-06-13-115.png|width=570!
 !native_clustering_byte_buffer_alloc.png|width=570!

Instead of retrieving of the value() as ByteBuffer we can introduce 
NativeValueAccessor to not expose ByteBuffer + keep it inside NativeCell as a 
thread-local to re-use. Same idea is applicable for NativeClustering

Note: for Cassandra 4.x the situation is even because we clone very BTreeRow 
from off-heap to heap due to using the same iterator as for usual reads (where 
we need to protect against memtable lifecycle changes). In case of flushing 
such protection is not needed. For TrieMemtable we skip such cloning 
(CASSANDRA-17240, 
[https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/db/memtable/TrieMemtable.java#L393]).
 It makes sense to consider such skipping for over types of memtable too..


> Avoid new ByteBuffer allocation for each NativeCell/NativeClustering during 
> flushing of offheap_objects memtable
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20173
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20173
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Memtable
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.0.x
>
>         Attachments: image-2024-12-29-13-06-13-115.png, 
> native_clustering_byte_buffer_alloc.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently for memtable_allocation_type: offheap_objects when we flush a 
> memtable we allocate new ByteBuffer object for each NativeCell and 
> NativeClustering to write it to disk. It is one of the main contributors 
> (together with CASSANDRA-20162) to memory allocation for MemtableFlushWriter 
> thread:
> !image-2024-12-29-13-06-13-115.png|width=570!
>  !native_clustering_byte_buffer_alloc.png|width=570!
> Instead of retrieving of the value() as ByteBuffer we can introduce 
> NativeValueAccessor to not expose ByteBuffer + keep it inside NativeCell as a 
> thread-local to re-use. Same idea is applicable for NativeClustering
> Note: for Cassandra 4.x the situation is even because we clone very BTreeRow 
> from off-heap to heap due to using the same iterator as for usual reads 
> (where we need to protect against memtable lifecycle changes). In case of 
> flushing such protection is not needed. For TrieMemtable we skip such cloning 
> (CASSANDRA-17240, 
> [https://github.com/apache/cassandra/blob/cassandra-5.0/src/java/org/apache/cassandra/db/memtable/TrieMemtable.java#L393]).
>  It makes sense to consider such skipping for over types of memtable too..



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to