[ 
https://issues.apache.org/jira/browse/CASSANDRA-20166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18042880#comment-18042880
 ] 

Dmitry Konstantinov edited comment on CASSANDRA-20166 at 12/4/25 7:34 PM:
--------------------------------------------------------------------------

h3. Load

1 partition text column, 1 clustering text column, 5 value text columns, 
inserts are done using 10-row batches.
cassandra-stress "user profile=./batch_profile.yaml no-warmup 
ops(insert=1,partition-select=0) n=10m" -rate threads=100 -node <IP>

[^batch_profile.yaml]
h3. Test environment

1 cassandra server node = m8i.4xlarge (16 vCPU, x86_64, 64 GiB RAM, EBS)
cassandra-stress = c5.9xlarge
h3. Profiling data

Memory allocation profile is collected using Async profiler tool (-e alloc)
Allocation profile before: [^CASSANDRA-20166_before_alloc.html]
HeapByteBuffer has 11.17% of allocations, it is the allocated class.
!heap_allocations_profile_before.png|width=540!

Allocation profile after: [^CASSANDRA-20166_after_alloc.html]
HeapByteBuffer is dropped down to 3.67% of allocations (we still allocate it 
for partition and clustering keys; it is too complicated to adjust the logic of 
parsing for them to use byte[])
!image-2025-12-04-19-25-48-624.png|width=540!


was (Author: dnk):
h3. Load

1 partition text column, 1 clustering text column, 5 value text columns, 
inserts are done using 10-row batches.
cassandra-stress "user profile=./batch_profile.yaml no-warmup 
ops(insert=1,partition-select=0) n=10m" -rate threads=100 -node <IP>
h3. Test environment

1 cassandra server node = m8i.4xlarge (16 vCPU, x86_64, 64 GiB RAM, EBS)
cassandra-stress = c5.9xlarge
h3. Profiling data

Memory allocation profile is collected using Async profiler tool (-e alloc)
Allocation profile before: [^CASSANDRA-20166_before_alloc.html]
HeapByteBuffer has 11.17% of allocations, it is the allocated class.
!heap_allocations_profile_before.png|width=540!

Allocation profile after: [^CASSANDRA-20166_after_alloc.html]
HeapByteBuffer is dropped down to 3.67% of allocations (we still allocate it 
for partition and clustering keys; it is too complicated to adjust the logic of 
parsing for them to use byte[])
!image-2025-12-04-19-25-48-624.png|width=540!

> Avoid ByteBuffer allocation during decoding of prepared CQL write requests
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20166
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: CQL/Interpreter
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: CASSANDRA-20166-trunk_ci_summary.htm, 
> CASSANDRA-20166-trunk_results_details.tar.xz, 
> CASSANDRA-20166_after_alloc.html, CASSANDRA-20166_before_alloc.html, 
> async_profiler_alloc.png, batch_profile.yaml, 
> heap_allocations_profile_before.png, image-2024-12-26-17-33-39-031.jpg, 
> image-2024-12-26-17-33-39-031.png, image-2024-12-26-17-35-05-485.png, 
> image-2025-12-04-19-25-48-624.png
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> A lot of ByteBuffer objects are allocated when we decode CQL queries, 
> frequently the space spent for such objects is large than the actual amount 
> of data received.
> There was a similar optimization (use byte[] directly and wrap them into 
> ArrayCell instead of BufferCell) done some time ago for the place where a 
> Mutation object is deserializing during a Cassandra cross-node communication 
> or reading from a disk: CASSANDRA-15393
> While a complete replacement of ByteBuffer with byte[] during CQL decoding 
> step looks like a very complex task (ByteBuffer is a part of too many 
> entities involved into CQL parsing) we can optimize 20% of logic to get 80% 
> of benefit by focusing only on batch and modification statements when 
> prepared statements are used and cell values are provided as bind variables.
> !image-2024-12-26-17-33-39-031.jpg|width=570!
> In case of 10-symbol String values I used for a test the wrapping ByteBuffer 
> objects are costlier than inner byte[] with data:
>  
> !image-2024-12-26-17-35-05-485.png|width=570!
> Async profiler (-e alloc) view:
> !async_profiler_alloc.png|width=570!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to