Hi Mark,

Non-adjacent files can be merged. For cassandra, it anyway needs to query
all sorted runs as it's timestamps are not strictly ordered.

Thanks,
Jay

On Tue, Nov 23, 2021 at 9:42 AM MARK CALLAGHAN <mdcal...@gmail.com> wrote:

> I am trying to understand how sorted runs are picked for compaction with
> STCS and the docs here don't have enough detail:
>
> https://cassandra.apache.org/doc/4.0/cassandra/operating/compaction/stcs.html
>
> These blog post have more detail but I still have a question:
> *
> http://distributeddatastore.blogspot.com/2021/06/cassandra-compaction.html
> *
> https://shrikantbang.wordpress.com/2014/04/22/size-tiered-compaction-strategy-in-apache-cassandra
>
> And the blog posts match what I see in the STCS source, especially
> getBuckets:
>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java#L251
>
> My question is whether non-adjacent sorted runs can be merged. To be
> precise assume there are 3 sorted runs, described by 5 attributes (min-key,
> max-key, min-ts, max-ts, size) where min-ts and max-ts are the min and max
> commit timestamps for mutations in that sorted run, min-key and max-key are
> the min and max key for mutations in that sorted run, size is the size of
> the sorted run. The sorted runs are:
> * s1 - min-key=A, max-key=D, min-ts=1, max-ts=99, size=100
> * s2 - min-key=A, max-key=D, min-ts=100, max-ts=199, size=10
> * s3 - min-key=A, max-key=D, min-ts=200, max-ts=299, size=100
>
> By "adjacent" I mean adjacent when sorted runs are ordered by commit
> timestamps. From the example above s1 & s2 are adjacent, s2 & s3 are
> adjacent but s1 and s3 are not adjacent.
>
> From what I saw in the STCS source code, s1 & s3 can end up in the same
> bucket without s2 because s2 is much smaller. If s1 & s3 were merged then
> the result might be:
> * s1+s3: min-key=A, max-key=D, min-ts=1, max-ts=299, size likely >= 100
> * s2 - min-key=A, max-key=D, min-ts=100, max-ts=199, size=10
>
> A side-effect from merging non-adjacent sorted runs can be that on a point
> query, something must be fetched from all sorted runs to determine the
> value to return. This differs from what happens with RocksDB where the heap
> code can read from the iterators in (commit timestamp) order and stop once
> it gets a key (with a value or a tombstone).
>
> --
> Mark Callaghan
> mdcal...@gmail.com
>

Reply via email to