I’ve encountered a handful of spinning platters, but not a lot. I think we should generally optimize for the common case, not the exception.
On Thu, Feb 27, 2025 at 9:51 AM Josh McKenzie <jmcken...@apache.org> wrote: > This is a significant enough performance problem *in normal operations* > I'd consider it a bug and thus eligible for back-porting. A couple other > thoughts: > > CASSANDRA-20158 and CASSANDRA-20164 are several orders of magnitudes > faster ... The only problem with this approach is that the tree is not > rebalanced... it would be trivial to force a full rebuild periodically or > based on some balance signal. > > I'd also be strongly in support of these modifications landing in all > currently supported branches if someone has the time and energy to step up > and do the work given the combined current test coverage and robust > property-based additions to the testing in 19596. > > Early open which is enabled by default > > Is it fair to assume most current C* nodes will no longer be on spinning > disk and thus we should change this default to disabled? > > On Thu, Feb 27, 2025, at 12:35 PM, Ariel Weisberg wrote: > > Hi, > > I want to discuss what versions we should backport IntervalTree > improvements to specifically 19596 which I think is the lower risk option > because it builds the same trees as before. I think we should at least > backport to 5.0. > > IntervalTree performance has shown up as a problematic bottleneck in a > couple of scenarios. If a node ends up with lots of small sstables it will > fall over trying to build IntervalTrees because they are very slow. With > 100k sstables which isn't unusual in the normal case for leveled compaction > it takes 200+ milliseconds to build one on my laptop. > > When this happens compactions and flushing can't complete and the mutation > stage backs up along with usual problems associated with a large compaction > backlog. Early open which is enabled by default makes this much worse > because it causes compaction to rebuild IntervalTree many times more often. > > The problem is the tree is immutable and currently the entire tree is > rebuilt every time it is updated. The proper fix is to create a persistent > tree, but no one has had the time at the moment and it's not a great > candidate for back porting to existing releases. > > CASSANDRA-19596 builds identical trees but aims to make the building > process significantly more efficient by removing a lot of different points > of overhead. The original tree does a lot of redundant sorting and > reallocation inside array lists. 19596 properly sizes the arrays when > allocating and reuses the existing sorted order to do a linear merge of > additions and removals. This is 2x faster to build a tree from scratch and > 4x faster to update. > > CASSANDRA-20158 and CASSANDRA-20164 are several orders of magnitudes > faster for some common operations (replace interval, add interval) and work > by treating the tree as a persistent tree and only rebuilding the changed > nodes. The only problem with this approach is that the tree is not > rebalanced, but in practice they still get rebuilt pretty often and it > would be trivial to force a full rebuild periodically or based on some > balance signal. > > Ariel > > >