Re: [DISCUSS] Improve Commitlog write path

Bowen Song via dev Wed, 20 Jul 2022 03:43:08 -0700

From my past experience, the bottleneck for insert heavy workload islikely to be compaction, not commit log. You initially may see commitlog as the bottleneck when the table size is relatively small, but asthe table size increases, compaction will likely take its place andbecome the new bottleneck.


On 20/07/2022 11:11, Pawar, Amit wrote:

[Public]

Hi all,
(My previous mail is not appearing in mailing list and resending againafter 2 days)
Myself Amit and working at AMD Bangalore, India. I am new to Cassandraand need to do Cassandra testing on large core systems. Usually shouldtest on multi-nodes Cassandra but started with Single node testing tounderstand how Cassandra scales with increasing core counts.
Test details:

Operation: Insert > 90% (insert heavy)

Operation: Scan < 10%

Cassandra: 3.11.10 and trunk

Benchmark: TPCx-IOT (similar to YCSB)
Results shows scaling is poor beyond 16 cores and it is almost linear.Following settings are the common settings helped to get the betterscores.
 1. Memtable heap allocation: offheap_objects
 2. memtable_flush_writers > 4
 3. Java heap: 8-32GB with survivor ratio tuning
 4. Separate storage space for Commitlog and Data.
Many online blogs suggest to add new Cassandra node when unable totake high writes. But with large systems, high writes should be easilytaken due to many cores. Need was to improve the scaling with morecores so this suggestion didn’t help. After many rounds of testing itwas observed that current implementation uses single thread forCommitlog syncing activity. Commitlog files are mapped using mmapsystem call and changes are written with msync. Periodic syncing withJVisualvm tool shows
 1. thread is not 100% busy with Ramdisk usage for Commitlog storage
    and scaling improved on large systems. Ramdisk scores > 2 X NVME
    score.
 2. thread becomes 100% busy with NVME usage for Commiglog and score
    does not improve much beyond 16 cores.
Linux kernel uses 4K pages for mapped memory with mmap system call.So, to understand this further, disk I/O testing was done using FIOtool and results shows
 1. NVME 4K random R/W throughput is very less with single thread and
    it improves with multi-threaded.
 2. Ramdisk 4K random R/W throughput is good with single thread only
    and also better with multi-threaded
Based on the FIO test results following two ideas were tested forCommitlog files with Cassandra-3.1.10 sources.
 1. Enable Direct IO feature for Commitlog files (similar to
    [CASSANDRA-14466] Enable Direct I/O - ASF JIRA (apache.org)
    <https://issues.apache.org/jira/browse/CASSANDRA-14466> )
 2. Enable Multi-threaded syncing for Commitlog files.
First one need to retest. Interestingly second one helped to improvethe score with “NVME” disk. NVME disk configuration score is almostwithin 80-90% of ramdisk and 2 times of single threadedimplementation. Multithreading enabled by adding new thread pool in“AbstractCommitLogSegmentManager” class and changed syncing thread asmanager thread for this new thread pool to take care synchronization.Only tested with Cassandra-3.11.10 and needs complete testing but thischange is working in my test environment. Tried these few experimentsso that I could discuss here and seek your valuable suggestions toidentify the right fix for insert heavy workloads.
 1. Is it good idea to convert single threaded syncing to
    multi-threading implementation to improve the disk IO?
 2. Direct I/O throughput is high with single thread and best fit for
    Commitlog case due to file size. This will improve writes on small
    to large systems. Good to bring this support for Commitlog files?

Please suggest.

Thanks,

Amit Pawar

Re: [DISCUSS] Improve Commitlog write path

Reply via email to