If there isn’t a TTL and timestamp on the data, I’m not sure the benefits of TWCS for this use case. I would stick with size-tiered. At some point you will end up with large sstables (like 1 TB) that won’t compact because there are not 4 similar-sized ones able to be compacted (assuming default parameters for STCS). And if your data is ever-growing and never deleted, you will be adding nodes to handle the extra data as time goes by (and running clean-up on the existing nodes). For me, the backup strategy shouldn’t drive the rest.
Sean R. Durity From: Paul Chandler <p...@redshots.com> Sent: Thursday, December 29, 2022 4:51 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data Hi Lapo Take a look at TWCS, I think that could help your use case: https: //thelastpickle. com/blog/2016/12/08/TWCS-part1. html [thelastpickle. com] Regards Paul Chandler Sent from my iPhone On 29 Dec 2022, at 08: 55, Lapo Luchini <lapo@ lapo. it> Hi Lapo Take a look at TWCS, I think that could help your use case: https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html [thelastpickle.com]<https://urldefense.com/v3/__https:/thelastpickle.com/blog/2016/12/08/TWCS-part1.html__;!!M-nmYVHPHQ!MthfOMR5U4-KTIvyS7qxtedGqxTx_C4S8cTa5Ym_htV3A1GA835t-aZNPYmXJexah45xutyI4Gra2ZwqSDcj$> Regards Paul Chandler Sent from my iPhone On 29 Dec 2022, at 08:55, Lapo Luchini <l...@lapo.it<mailto:l...@lapo.it>> wrote: Hi, I have a table which gets (a lot of) data that is written once and very rarely read (it is used for data that is mandatory for regulatory reasons), and almost never deleted. I'm using the default SCTS as at the time I didn't know any better, but SSTables size are getting huge, which is a problem because they both are getting to the size of the available disk and both because I'm using a snapshot-based system to backup the node (and thus compacting a huge SSTable into an even bigger one generates a lot of traffic for mostly-old data). I'm thinking about switching to LCS (mainly to solve the size issue), but I read that it is "optimized for read heavy workloads […] not a good choice for immutable time series data". Given that I don't really care about write nor read speed, but would like SSTables size to have a upper limit, would this strategy still be the best? PS: Googling around a strategy called "incremental compaction" (ICS) keeps getting in results, but that's only available in ScyllaDB, right? -- Lapo Luchini l...@lapo.it<mailto:l...@lapo.it> INTERNAL USE