Re: [DISCUSS] CEP-26: Unified Compaction Strategy

Branimir Lambov Fri, 17 Mar 2023 06:54:16 -0700

The prototype of UCS can now be found in this pull request:
https://github.com/apache/cassandra/pull/2228


Its description is given in the included markdown documentation:
https://github.com/blambov/cassandra/blob/UCS-density/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md

The latest code includes some new elements compared to the link Henrik
posted, including density levelling, bucketing based solely on overlap, and
output splitting by expected density. It goes a little further than what is
described in the CEP-26 proposal as prototyping showed that we can make the
selection of sstables to compact and the sharding decisions independent of
each other. This makes the strategy more stable and better able to react to
changes in configuration and environment.

Regards,
Branimir

On Wed, Dec 21, 2022 at 10:01 AM Benedict <bened...@apache.org> wrote:

> I’m personally very excited by this work. Compaction could do with a
> spring clean and this feels to formalise things much more cleanly, but
> density tiering in particular is something I’ve wanted to incorporate for
> years now, as it should significantly improve STCS behaviour (most
> importantly reducing read amplification and the amount of disk space
> required, narrowing the performance delta to LCS in these important
> dimensions), and simplifies re-levelling of LCS, making large streams much
> less painful.
>
> On 21 Dec 2022, at 07:19, Henrik Ingo <henrik.i...@datastax.com> wrote:
>
> 
> I noticed the CEP doesn't link to this, so it should be worth mentioning
> that the UCS documentation is available here:
> https://github.com/datastax/cassandra/blob/ds-trunk/doc/unified_compaction.md
>
> Both of the above seem to do a poor job referencing the literature we've
> been inspired by. I will link to Mark Callaghan's blog on the subject:
>
>
> http://smalldatum.blogspot.com/2018/07/tiered-or-leveled-compaction-why-not.html?m=1
> <https://urldefense.com/v3/__http://smalldatum.blogspot.com/2018/07/tiered-or-leveled-compaction-why-not.html?m=1__;!!PbtH5S7Ebw!Yl4p4GbDXwIxv3LqE22ZTb7rts5YMhROy-ldQnvjOoWW3wTylErPe4ZGChHuxz1ahebyIrxNMkJYObDTMjgpQnZW$>
>
> ...and lazily will also borrow from Mark a post that references a bunch of
> LSM (not just UCS related) academic papers:
> http://smalldatum.blogspot.com/2018/08/name-that-compaction-algorithm.html?m=1
> <https://urldefense.com/v3/__http://smalldatum.blogspot.com/2018/08/name-that-compaction-algorithm.html?m=1__;!!PbtH5S7Ebw!Yl4p4GbDXwIxv3LqE22ZTb7rts5YMhROy-ldQnvjOoWW3wTylErPe4ZGChHuxz1ahebyIrxNMkJYObDTMhKyBRnd$>
>
> Finally, it's perhaps worth mentioning that UCS has been in production in
> our Astra Serverless cloud service since it was launched in March 2021. The
> version described by the CEP therefore already incorporates some
> improvements based on observed production behaviour.
>
> Henrik
>
> On Mon, 19 Dec 2022, 15:41 Branimir Lambov, <blam...@apache.org> wrote:
>
>> Hello everyone,
>>
>> I would like to open the discussion on our proposal for a unified
>> compaction strategy that aims to solve well-known problems with compaction
>> and improve parallelism to permit higher levels of sustained write
>> throughput.
>>
>> The proposal is here:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy
>>
>> The strategy is based on two main observations:
>> - that tiered and levelled compaction can be generalized as the same
>> thing if one observes that both form exponentially-growing levels based on
>> the size of sstables (or non-overlapping sstable runs) and trigger a
>> compaction when more than a given number of sstables are present on one
>> level;
>> - that instead of "size" in the description above we can use "density",
>> i.e. the size of an sstable divided by the width of the token range it
>> covers, which permits sstables to be split at arbitrary points when the
>> output of a compaction is written and still produce a levelled hierarchy.
>>
>> The latter allows us to shard the compaction space into
>> progressively higher numbers of shards as data moves to the higher levels
>> of the hierarchy, improving parallelism, space requirements and the
>> duration of compactions, and the former allows us to cover the existing
>> strategies, as well as hybrid mixtures that can prove more efficient for
>> some workloads.
>>
>> Thank you,
>> Branimir
>>
>>

Re: [DISCUSS] CEP-26: Unified Compaction Strategy

Reply via email to