So when I wrote TWCS, I wrote it for a use case that had 24h TTLs and 30
days of retention. In that application, we had tested 12h windows, 24h
windows, and 7 day windows, and eventually settled on 24h windows because
that balanced factors like sstable size, sstables-per-read, and expired
data waiting to be dropped (about 3%, 1/30th, on any given day). That's
where that recommendation came from - it was mostly around how much expired
data will sit around waiting to be dropped. That doesn't change with
multiple data directories.

If you go with fewer windows, you'll expire larger chunks at a time, which
means you'll retain larger chunks waiting on expiration.
If you go with more windows, you'll potentially touch more sstables on read.

Realistically, if you can model your data to align with chunks (so each
read only touches one window), the actual number of sstables shouldn't
really matter much - the timestamps and bloom filter will avoid touching
most of them on the read path anyway. If your data model doesnt have a
timestamp component to it and you're touching lots of sstables on read,
even 30 sstables is probably going to hurt you, and 210 would be really,
really bad.





On Wed, Sep 28, 2022 at 7:00 AM Grzegorz Pietrusza <gpietru...@gmail.com>
wrote:

> Hi All!
>
> According to TWCS documentation (
> https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html)
> the operator should choose compaction window parameters to select a
> compaction_window_unit and compaction_window_size pair that produces
> approximately 20-30 windows.
>
> I'm curious where this recommendation comes from? Also should the number
> of windows be changed when more than one data directory is used? In my
> example there are 7 data directories (partitions) and it seems that all of
> them store 20-30 windows. Effectively this gives 140-210 sstables in total.
> Is that an optimal configuration?
>
> Running on Cassandra 3.11
>
> Regards
> Grzegorz
>

Reply via email to