This is a significant enough performance problem *in normal operations* I'd 
consider it a bug and thus eligible for back-porting. A couple other thoughts:

> CASSANDRA-20158 and CASSANDRA-20164 are several orders of magnitudes faster 
> ... The only problem with this approach is that the tree is not rebalanced... 
> it would be trivial to force a full rebuild periodically or based on some 
> balance signal.
I'd also be strongly in support of these modifications landing in all currently 
supported branches if someone has the time and energy to step up and do the 
work given the combined current test coverage and robust property-based 
additions to the testing in 19596.

> Early open which is enabled by default
Is it fair to assume most current C* nodes will no longer be on spinning disk 
and thus we should change this default to disabled?

On Thu, Feb 27, 2025, at 12:35 PM, Ariel Weisberg wrote:
> Hi,
> 
> I want to discuss what versions we should backport IntervalTree improvements 
> to specifically 19596 which I think is the lower risk option because it 
> builds the same trees as before. I think we should at least backport to 5.0.
> 
> IntervalTree performance has shown up as a problematic bottleneck in a couple 
> of scenarios. If a node ends up with lots of small sstables it will fall over 
> trying to build IntervalTrees because they are very slow. With 100k sstables 
> which isn't unusual in the normal case for leveled compaction it takes 200+ 
> milliseconds to build one on my laptop.
> 
> When this happens compactions and flushing can't complete and the mutation 
> stage backs up along with usual problems associated with a large compaction 
> backlog. Early open which is enabled by default makes this much worse because 
> it causes compaction to rebuild IntervalTree many times more often. 
> 
> The problem is the tree is immutable and currently the entire tree is rebuilt 
> every time it is updated. The proper fix is to create a persistent tree, but 
> no one has had the time at the moment and it's not a great candidate for back 
> porting to existing releases.
> 
> CASSANDRA-19596 builds identical trees but aims to make the building process 
> significantly more efficient by removing a lot of different points of 
> overhead. The original tree does a lot of redundant sorting and reallocation 
> inside array lists. 19596 properly sizes the arrays when allocating and 
> reuses the existing sorted order to do a linear merge of additions and 
> removals. This is 2x faster to build a tree from scratch and 4x faster to 
> update.
> 
> CASSANDRA-20158 and CASSANDRA-20164 are several orders of magnitudes faster 
> for some common operations (replace interval, add interval) and work by 
> treating the tree as a persistent tree and only rebuilding the changed nodes. 
> The only problem with this approach is that the tree is not rebalanced, but 
> in practice they still get rebuilt pretty often and it would be trivial to 
> force a full rebuild periodically or based on some balance signal.
> 
> Ariel
> 

Reply via email to