[
https://issues.apache.org/jira/browse/CASSANDRA-18945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Branimir Lambov updated CASSANDRA-18945:
----------------------------------------
Attachment: key-value-oss.html
> Unified Compaction Strategy is creating too many sstables
> ---------------------------------------------------------
>
> Key: CASSANDRA-18945
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18945
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Compaction
> Reporter: Branimir Lambov
> Assignee: Ethan Brown
> Priority: Normal
> Attachments: key-value-oss.html
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The unified compaction strategy currently aims to create sstables with close
> to the same size, defaulting to 1 GiB. Unfortunately tests show that
> Cassandra starts to have performance problems when the number of sstables
> grows to the order of a thousand, and in particular that even 1 TiB of data
> with the default configuration is creating too many sstables for efficient
> processing. This matters even more for SAI, where the number of sstables in
> the system can have a proportional effect on the complexity of operations.
> It is quite easy to create a configuration option that allows sstables to
> take some part of the data growth by adding a multiplier to [the shard count
> calculation|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md#sharding]
> formula, replacing
> {{2 ^ round(log2(d / (t * b))) * b}}
> with
> {{2 ^ round((1 - 𝜆) * log2(d / (t * b))) * b}},
> where 𝜆 is a parameter whose value is between 0 and 1.
> With this, a 𝜆 of 0.5 would mean that shard count and sstable size grow in
> parallel at the square root of the data size growth. 0 would result in no
> growth, and 1 in always using the same number of shards.
> It may also be valuable to introduce a threshold for engaging the base shard
> count to avoid splitting lowest-level sstables into fragments that are too
> small.
> Once both of these are in place, we can set defaults that better suit all
> node densities, including 10 TiB and beyond, for example:
> - target size of 1 GiB
> - 𝜆 of 1/3
> - base shard count of 4
> - minimum size 100 MiB
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]