[
https://issues.apache.org/jira/browse/CASSANDRA-18397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709456#comment-17709456
]
Joey Lynch edited comment on CASSANDRA-18397 at 4/6/23 4:34 PM:
----------------------------------------------------------------
[~blambov] overall looks awesome. I have a few suggestions from our testing of
BoundedRead
([patch|https://github.com/jolynch/cassandra/commit/b2516af983930b3fe5bb016b5039c289ea396fd6#diff-447488af7557e6bb324a058fa442f009624d00ef8b4d12a18dcbf4f94950c1ec],
[options|https://gist.github.com/jolynch/9118465f32ad5298b4e39d03ccd4370e#boundedreadcompactionstrategy])
that you might either want to incorporate now or later (I think the current
proposal is a huge step forwards because of the sharding and density tiering,
definitely don't want to hold it up). If you agree with some of these I would
love to collaborate on getting some of them into your patch either now or after
merge (I think these could be added later if you agree)
* target_consolidate_interval_in_seconds (1 hour): When tiering in the lower
levels (the T ones from your proposal) force a compaction if this amount of
time passes.
* Dynamic tier for lowest level: When data is being dumped into the lower tier
would lead to redundant compaction than the consolidate interval (e.g. if you
have a tier of 4 but you are getting a flush every minute meaning you'd compact
more than once per 1 hour interval), autorange the tier up to ~4x the input.
The combination of this plus target_consolidate_interval_in_seconds leads to
significantly less compaction during data loads.
* max_level_age_in_seconds (10 days): If a table hasn't participated in
compaction (in your case a shard) in this amount of time do a full vertical
compaction to ensure tombstones find their data. This would finally make C*
guarantee that it has tombstones find data.
* Breaking up full compaction: I think that the sharding technique does this,
but when an operator does a full compaction if you instead just flip a
timestamp to trigger normal background compactions of the shards, then you
don't need double disk space (you only need the size of #compactions * shard
size).
was (Author: jolynch):
[~blambov] overall looks awesome. I have a few suggestions from our testing of
BoundedRead
([patch|https://github.com/jolynch/cassandra/commit/b2516af983930b3fe5bb016b5039c289ea396fd6#diff-447488af7557e6bb324a058fa442f009624d00ef8b4d12a18dcbf4f94950c1ec],
[options|https://gist.github.com/jolynch/9118465f32ad5298b4e39d03ccd4370e#boundedreadcompactionstrategy])
that you might either want to incorporate now or later (I think the current
proposal is a huge step forwards because of the sharding and density tiering,
definitely don't want to hold it up). If you agree with some of these I would
love to collaborate on getting some of them into your patch
* target_consolidate_interval_in_seconds (1 hour): When tiering in the lower
levels (the T ones from your proposal) force a compaction if this amount of
time passes.
* Dynamic tier for lowest level: When data is being dumped into the lower tier
would lead to redundant compaction than the consolidate interval (e.g. if you
have a tier of 4 but you are getting a flush every minute meaning you'd compact
more than once per 1 hour interval), autorange the tier up to ~4x the input.
The combination of this plus target_consolidate_interval_in_seconds leads to
significantly less compaction during data loads.
* max_level_age_in_seconds (10 days): If a table hasn't participated in
compaction (in your case a shard) in this amount of time do a full vertical
compaction to ensure tombstones find their data. This would finally make C*
guarantee that it has tombstones find data.
* Breaking up full compaction: I think that the sharding technique does this,
but when an operator does a full compaction if you instead just flip a
timestamp to trigger normal background compactions of the shards, then you
don't need double disk space (you only need the size of #compactions * shard
size).
> CEP-26: Unified Compaction Strategy
> -----------------------------------
>
> Key: CASSANDRA-18397
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18397
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Compaction
> Reporter: Branimir Lambov
> Assignee: Branimir Lambov
> Priority: Normal
>
> Implementation of Unified Compaction Strategy per
> [CEP-26|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-26%3A+Unified+Compaction+Strategy].
> Further documentation of the most current state of the solution can be found
> in [the included markdown
> documentation|https://github.com/blambov/cassandra/blob/CASSANDRA-18397/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]