Hi Mark, Non-adjacent files can be merged. For cassandra, it anyway needs to query all sorted runs as it's timestamps are not strictly ordered.
Thanks, Jay On Tue, Nov 23, 2021 at 9:42 AM MARK CALLAGHAN <mdcal...@gmail.com> wrote: > I am trying to understand how sorted runs are picked for compaction with > STCS and the docs here don't have enough detail: > > https://cassandra.apache.org/doc/4.0/cassandra/operating/compaction/stcs.html > > These blog post have more detail but I still have a question: > * > http://distributeddatastore.blogspot.com/2021/06/cassandra-compaction.html > * > https://shrikantbang.wordpress.com/2014/04/22/size-tiered-compaction-strategy-in-apache-cassandra > > And the blog posts match what I see in the STCS source, especially > getBuckets: > > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java#L251 > > My question is whether non-adjacent sorted runs can be merged. To be > precise assume there are 3 sorted runs, described by 5 attributes (min-key, > max-key, min-ts, max-ts, size) where min-ts and max-ts are the min and max > commit timestamps for mutations in that sorted run, min-key and max-key are > the min and max key for mutations in that sorted run, size is the size of > the sorted run. The sorted runs are: > * s1 - min-key=A, max-key=D, min-ts=1, max-ts=99, size=100 > * s2 - min-key=A, max-key=D, min-ts=100, max-ts=199, size=10 > * s3 - min-key=A, max-key=D, min-ts=200, max-ts=299, size=100 > > By "adjacent" I mean adjacent when sorted runs are ordered by commit > timestamps. From the example above s1 & s2 are adjacent, s2 & s3 are > adjacent but s1 and s3 are not adjacent. > > From what I saw in the STCS source code, s1 & s3 can end up in the same > bucket without s2 because s2 is much smaller. If s1 & s3 were merged then > the result might be: > * s1+s3: min-key=A, max-key=D, min-ts=1, max-ts=299, size likely >= 100 > * s2 - min-key=A, max-key=D, min-ts=100, max-ts=199, size=10 > > A side-effect from merging non-adjacent sorted runs can be that on a point > query, something must be fetched from all sorted runs to determine the > value to return. This differs from what happens with RocksDB where the heap > code can read from the iterators in (commit timestamp) order and stop once > it gets a key (with a value or a tombstone). > > -- > Mark Callaghan > mdcal...@gmail.com >