I am trying to understand how sorted runs are picked for compaction with
STCS and the docs here don't have enough detail:
https://cassandra.apache.org/doc/4.0/cassandra/operating/compaction/stcs.html

These blog post have more detail but I still have a question:
* http://distributeddatastore.blogspot.com/2021/06/cassandra-compaction.html
*
https://shrikantbang.wordpress.com/2014/04/22/size-tiered-compaction-strategy-in-apache-cassandra

And the blog posts match what I see in the STCS source, especially
getBuckets:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java#L251

My question is whether non-adjacent sorted runs can be merged. To be
precise assume there are 3 sorted runs, described by 5 attributes (min-key,
max-key, min-ts, max-ts, size) where min-ts and max-ts are the min and max
commit timestamps for mutations in that sorted run, min-key and max-key are
the min and max key for mutations in that sorted run, size is the size of
the sorted run. The sorted runs are:
* s1 - min-key=A, max-key=D, min-ts=1, max-ts=99, size=100
* s2 - min-key=A, max-key=D, min-ts=100, max-ts=199, size=10
* s3 - min-key=A, max-key=D, min-ts=200, max-ts=299, size=100

By "adjacent" I mean adjacent when sorted runs are ordered by commit
timestamps. From the example above s1 & s2 are adjacent, s2 & s3 are
adjacent but s1 and s3 are not adjacent.

>From what I saw in the STCS source code, s1 & s3 can end up in the same
bucket without s2 because s2 is much smaller. If s1 & s3 were merged then
the result might be:
* s1+s3: min-key=A, max-key=D, min-ts=1, max-ts=299, size likely >= 100
* s2 - min-key=A, max-key=D, min-ts=100, max-ts=199, size=10

A side-effect from merging non-adjacent sorted runs can be that on a point
query, something must be fetched from all sorted runs to determine the
value to return. This differs from what happens with RocksDB where the heap
code can read from the iterators in (commit timestamp) order and stop once
it gets a key (with a value or a tombstone).

-- 
Mark Callaghan
mdcal...@gmail.com

Reply via email to