[jira] [Comment Edited] (CASSANDRA-18397) CEP-26: Unified Compaction Strategy

Joey Lynch (Jira) Thu, 06 Apr 2023 09:35:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709456#comment-17709456
 ]


Joey Lynch edited comment on CASSANDRA-18397 at 4/6/23 4:34 PM:
----------------------------------------------------------------

[~blambov] overall looks awesome. I have a few suggestions from our testing of 
BoundedRead 
([patch|https://github.com/jolynch/cassandra/commit/b2516af983930b3fe5bb016b5039c289ea396fd6#diff-447488af7557e6bb324a058fa442f009624d00ef8b4d12a18dcbf4f94950c1ec],
 
[options|https://gist.github.com/jolynch/9118465f32ad5298b4e39d03ccd4370e#boundedreadcompactionstrategy])
 that you might either want to incorporate now or later (I think the current 
proposal is a huge step forwards because of the sharding and density tiering, 
definitely don't want to hold it up). If you agree with some of these I would 
love to collaborate on getting some of them into your patch either now or after 
merge (I think these could be added later if you agree)
 * target_consolidate_interval_in_seconds (1 hour): When tiering in the lower 
levels (the T ones from your proposal) force a compaction if this amount of 
time passes.
 * Dynamic tier for lowest level: When data is being dumped into the lower tier 
would lead to redundant compaction than the consolidate interval (e.g. if you 
have a tier of 4 but you are getting a flush every minute meaning you'd compact 
more than once per 1 hour interval), autorange the tier up to ~4x the input. 
The combination of this plus target_consolidate_interval_in_seconds leads to 
significantly less compaction during data loads.
 * max_level_age_in_seconds (10 days): If a table hasn't participated in 
compaction (in your case a shard) in this amount of time do a full vertical 
compaction to ensure tombstones find their data. This would finally make C* 
guarantee that it has tombstones find data.
 * Breaking up full compaction: I think that the sharding technique does this, 
but when an operator does a full compaction if you instead just flip a 
timestamp to trigger normal background compactions of the shards, then you 
don't need double disk space (you only need the size of #compactions * shard 
size).


was (Author: jolynch):
[~blambov] overall looks awesome. I have a few suggestions from our testing of 
BoundedRead 
([patch|https://github.com/jolynch/cassandra/commit/b2516af983930b3fe5bb016b5039c289ea396fd6#diff-447488af7557e6bb324a058fa442f009624d00ef8b4d12a18dcbf4f94950c1ec],
 
[options|https://gist.github.com/jolynch/9118465f32ad5298b4e39d03ccd4370e#boundedreadcompactionstrategy])
 that you might either want to incorporate now or later (I think the current 
proposal is a huge step forwards because of the sharding and density tiering, 
definitely don't want to hold it up). If you agree with some of these I would 
love to collaborate on getting some of them into your patch

* target_consolidate_interval_in_seconds (1 hour): When tiering in the lower 
levels (the T ones from your proposal) force a compaction if this amount of 
time passes.
* Dynamic tier for lowest level: When data is being dumped into the lower tier 
would lead to redundant compaction than the consolidate interval (e.g. if you 
have a tier of 4 but you are getting a flush every minute meaning you'd compact 
more than once per 1 hour interval), autorange the tier up to ~4x the input. 
The combination of this plus target_consolidate_interval_in_seconds leads to 
significantly less compaction during data loads.
* max_level_age_in_seconds (10 days): If a table hasn't participated in 
compaction (in your case a shard) in this amount of time do a full vertical 
compaction to ensure tombstones find their data. This would finally make C* 
guarantee that it has tombstones find data.
* Breaking up full compaction: I think that the sharding technique does this, 
but when an operator does a full compaction if you instead just flip a 
timestamp to trigger normal background compactions of the shards, then you 
don't need double disk space (you only need the size of #compactions * shard 
size).

> CEP-26: Unified Compaction Strategy
> -----------------------------------
>
>                 Key: CASSANDRA-18397
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18397
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Compaction
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Normal
>
> Implementation of Unified Compaction Strategy per 
> [CEP-26|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy].
> Further documentation of the most current state of the solution can be found 
> in [the included markdown 
> documentation|https://github.com/blambov/cassandra/blob/CASSANDRA-18397/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-18397) CEP-26: Unified Compaction Strategy

Reply via email to