[ 
https://issues.apache.org/jira/browse/CASSANDRA-20158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-20158:
---------------------------------------
    Resolution: Duplicate
        Status: Resolved  (was: Open)

> IntervalTree should support copyAndReplace for checkpoint when ranges are 
> unchanged
> -----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20158
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction, Local/SSTable
>            Reporter: Yuqi Yan
>            Assignee: Yuqi Yan
>            Priority: Normal
>             Fix For: 4.1.x
>
>         Attachments: image-2024-12-20-02-39-53-420.png, 
> image-2024-12-20-02-41-06-544.png, image-2024-12-20-02-42-52-003.png
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We observed very slow compaction and sometimes stuck memtable flushing hence 
> caused write latency spikes when the cluster has large number of SSTables 
> (~20K), similar to what was observed in CASSANDRA-19596.
> Looking deeper into when these interval tree is rebuilt - there is actually 
> no need to do rebuild all the time for checkpoint() calls.
>  
> updateLiveSet(toUpdate, staged.update) this is updating the current version 
> of SSTableReader within the View. However the update isn't always changing 
> the ranges of the SSTableReader (low, high).
>  
> One Example:
>  * IndexSummaryRedistribution.adjustSamplingLevels()
>  ** SSTableReader replacement = sstable.cloneWithNewSummarySamplingLevel(cfs, 
> entry.newSamplingLevel);
>  ** This is changing the Metadata only and the ranges are unchanged
>  
> Considering this, rebuilding the entire IntervalTree will not be required, 
> instead IntervalTree should support replacing these SSTableReader.
>  
> If we're rebuilding the tree, complexity is O(n(logn)^2) in current trunk as 
> we're repeating the O(nlogn) sort on every node creation, after 
> CASSANDRA-19596 this will be O(nlogn), but with update supported, some of the 
> updateLiveSet calls can be optimized to O(m(logn)^2) where m is the number of 
> SSTableReaders we attempt to replace, which we have m << n (number of 
> SSTables) in most cases.
>  
> This is achieved by
>  # finding the node containing the SSTable (logn)
>  # binary search and replacing the SSTableReader from the node (logn)
>  # To support CAS update, 1 and 2 need to done by copying the path and 
> re-create the affected nodes on the path
>  
> The experiment I did was on a 2 rings setup, one on 4.1+CASSANDRA-19596 
> (marked as trunk), and the other on  4.1+CASSANDRA-19596+this patch (marked 
> as new). ~15K SSTables (with LCS, sstable size was 50MB, single_uplevel 
> enabled). stress-test with 1:1 rw ratio.
> Result shows that ~15% of the checkpoint calls don't necessarily need to 
> rebuild the tree.
> !image-2024-12-20-02-41-06-544.png|width=803,height=133!
> Compaction throughput (scanner read throughput) was increased from 130MB/s to 
> 200MB/s
> !image-2024-12-20-02-39-53-420.png|width=1378,height=136!
> Checkpoint finish time reduced from mean ~1.5s to ~800ms
> !image-2024-12-20-02-42-52-003.png|width=1100,height=186!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to