[ 
https://issues.apache.org/jira/browse/CASSANDRA-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036651#comment-18036651
 ] 

Dmitry Konstantinov edited comment on CASSANDRA-20226 at 11/10/25 12:08 AM:
----------------------------------------------------------------------------

Note: the dips on the graphs for the new version are related to facing of 2 new 
bottlenecks once we've got the ability to send a higher load:
 # Default 2 flushing threads cannot keep the write pace and we have writer 
threads are blocked periodically by a backpressure logic on allocating in 
org.apache.cassandra.utils.memory.MemtableAllocator.SubAllocator#allocate
 # Heap allocation rate is higher and concurrent GC threads consume more CPU, 
in combination with high CPU usage in total (near 90-100%) and the fact that we 
have by default number of concurrent GC threads equal to number of cpu cores - 
concurrent GC threads steal too much CPU from processing logic and heavily 
impact it.

To check I've configured 4 flushing threads + reduced -XX:ConcGCThreads to 4 
(-XX:ParallelGCThreads was not changed and still is 16), with these changes the 
graph is smoother:

!image-2025-11-10-00-04-57-497.png|width=1000!

 

 

 


was (Author: dnk):
Note: the dips on the graphs for the new version are related to facing of 2 new 
bottlenecks once we've got the ability to send a higher load:
 # Default 2 flushing threads cannot keep the write pace and we have writer 
threads are blocked periodically by a backpressure logic on allocating in 
org.apache.cassandra.utils.memory.MemtableAllocator.SubAllocator#allocate
 # Heap allocation rate is higher and concurrent GC threads consume more CPU, 
in combination with high CPU usage in total (near 90-100%) and the fact that we 
have by default number of concurrent GC threads equal to number of cpu cores - 
concurrent GC threads steal too much CPU from processing logic

To check I've configured 4 flushing threads + reduced -XX:ConcGCThreads to 4 
(-XX:ParallelGCThreads was not changed and still is 16), with these changes the 
graph is smoother:

!image-2025-11-10-00-04-57-497.png|width=1000!

 

 

 

> Reduce contention in MemtableAllocator.allocate
> -----------------------------------------------
>
>                 Key: CASSANDRA-20226
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20226
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Memtable
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 5.1_batch_LongAdder.html, 5.1_batch_addAndGet.html, 
> 5.1_batch_alloc_batching.html, 5.1_batch_baseline.html, 
> 5.1_batch_pad_allocated.html, CASSANDRA-20226_ci_summary.htm, 
> CASSANDRA-20226_results_details.tar.xz, 
> ci_summary_netudima_CASSANDRA-20226-trunk_52.html, cpu_profile_batch.html, 
> image-2025-01-20-23-38-58-896.png, image-2025-11-10-00-04-57-497.png, 
> profile.yaml, results_details_netudima_CASSANDRA-20226-trunk_52.tar.xz, 
> test_results_m8i.4xlarge_heap_buffers.html, 
> test_results_m8i.4xlarge_heap_buffers.png, 
> test_results_m8i.4xlarge_offheap_objects.html, 
> test_results_m8i.4xlarge_offheap_objects.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For a high insert batch rate it looks like we have a bottleneck in 
> NativeAllocator.allocate probably caused by contention within the logic.
> !image-2025-01-20-23-38-58-896.png|width=300!
> [^cpu_profile_batch.html]
> The logic has at least the following 2 potential places to assess:
>  # allocation cycle in MemtablePool.SubPool#tryAllocate. This logic has a 
> while loop with a CAS, which can be non-efficient under a high contention, 
> similar to CASSANDRA-15922 we can try to replace it with addAndGet (need to 
> check if it does not break the allocator logic)
>  # swap region logic in NativeAllocator.trySwapRegion (under a high insert 
> rate 1MiB regions can be swapped quite frequently)
> Reproducing test details:
>  * test logic
> {code:java}
> ./tools/bin/cassandra-stress "user profile=./profile.yaml no-warmup 
> ops(insert=1) n=10m" -rate threads=100  -node somenode
> {code}
>  * Cassandra version: 5.0.3
>  * configuration changes compared to default:
> {code:java}
> memtable_allocation_type: offheap_objects
> memtable:
>   configurations:
>     skiplist:
>       class_name: SkipListMemtable
>     trie:
>       class_name: TrieMemtable
>       parameters:
>              shards: 32
>     default:
>       inherits: trie 
> {code}
>  * 1 node cluster
>  * OpenJDK jdk-17.0.12+7
>  * Linux kernel: 4.18.0-240.el8.x86_64
>  * CPU: 16 cores, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
>  * RAM: 46GiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to