[ 
https://issues.apache.org/jira/browse/CASSANDRA-20594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-20594:
---------------------------------------
    Description: 
The next step for improving {{IntervalTree}} is probably to replace it with a 
proper persistent tree rather than to keep hacking on the existing one yielding 
incremental improvements in the < 10x range. A persistent tree could yield 
100-1000x improvements and allow us to stop worrying about IntervalTree 
performance indefinitely.

Right now I think we have bought ourselves ~5 years. Between the improved speed 
of the tree and increasing sstable size in leveled compaction (which is not a 
default!) to reduce the number of sstables an operator can probably limp along 
about that long.

There are still some improvements that could be done to the current tree. The 
current tree has both a sorted array it always maintains to make updating the 
tree faster and then an actual interval tree with nodes. It might be possible 
to search the tree using just the sorted arrays which means we can discard the 
actual tree nodes and save a lot of the time spent building the tree nodes. 
Right now we pay the cost of updating the arrays on all forms of updates so the 
tree is just additional overhead.

Continuing to improve the existing one is probably throwing good money after 
bad, but it’s tempting because implementing search on the arrays might actually 
be easy.

  was:
The next step for improving {{IntervalTree}} is probably to replace it with a 
proper persistent tree rather than to keep hacking on the existing one yielding 
incremental improvements in the < 10x range. A persistent tree could yield 
100-1000x improvements and allow us to stop worrying about IntervalTree 
performance indefinitely.

Right now I think we have bought ourselves a ~5 years. Between the improved 
speed of the tree and increasing sstable size in leveled compaction (which is 
not a default!) to reduce the number of sstables an operator can probably limp 
along about that long.

There are still some improvements that could be done to the current tree. The 
current tree has both a sorted array it always maintains to make updating the 
tree faster and then an actual interval tree with nodes. It might be possible 
to search the tree using just the sorted arrays which means we can discard the 
actual tree nodes and save a lot of the time spent building the tree nodes. 
Right now we pay the cost of updating the arrays on all forms of updates so the 
tree is just additional overhead.

Continuing to improve the existing one is probably throwing good money after 
bad, but it’s tempting because implementing search on the arrays might actually 
be easy.


> Potential IntervalTree followup improvements
> --------------------------------------------
>
>                 Key: CASSANDRA-20594
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20594
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction
>            Reporter: Ariel Weisberg
>            Priority: Normal
>
> The next step for improving {{IntervalTree}} is probably to replace it with a 
> proper persistent tree rather than to keep hacking on the existing one 
> yielding incremental improvements in the < 10x range. A persistent tree could 
> yield 100-1000x improvements and allow us to stop worrying about IntervalTree 
> performance indefinitely.
> Right now I think we have bought ourselves ~5 years. Between the improved 
> speed of the tree and increasing sstable size in leveled compaction (which is 
> not a default!) to reduce the number of sstables an operator can probably 
> limp along about that long.
> There are still some improvements that could be done to the current tree. The 
> current tree has both a sorted array it always maintains to make updating the 
> tree faster and then an actual interval tree with nodes. It might be possible 
> to search the tree using just the sorted arrays which means we can discard 
> the actual tree nodes and save a lot of the time spent building the tree 
> nodes. Right now we pay the cost of updating the arrays on all forms of 
> updates so the tree is just additional overhead.
> Continuing to improve the existing one is probably throwing good money after 
> bad, but it’s tempting because implementing search on the arrays might 
> actually be easy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to