With a 10 year retention, just ignore the target sstable count (I should remove 
that guidance, to be honest), and go for a 1 week window to match your 
partition size. 520 sstables on disk isn’t going to hurt you as long as you’re 
not reading from all of them, and with a partition-per-week the bloom filter is 
going to make things nice and easy for you.

 

-          Jeff

 

 

From: Voytek Jarnot <voytek.jar...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, December 16, 2016 at 10:37 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Choosing a compaction strategy (TWCS)

 

Scenario: 

Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra 
tables, basically time-series - think log or auditing.  Retention is 10 years, 
but greater than 95% of reads will occur on data written within the last year. 
7 day TTL used on a small percentage of the records, majority do not use TTL. 
Other than the aforementioned TTL, and the 10-year purge, no updates or deletes 
are done.

 

Seems like TWCS is the right choice, but I have a few questions/concerns:

 

1) I'll be bulk loading a few years of existing data upon deployment - any 
issues with that?  I assume using "with timestamp" when inserting this data 
will be mandatory if I choose TWCS?

 

2) I read here (https://github.com/jeffjirsa/twcs/) that "You should target 
fewer than 50 buckets per table based on your TTL." That's going to be a tough 
goal with a 10 year retention ... can anyone speak to how important this target 
really is?

 

3) If I'm bucketing my data with week/year (i.e., partition on year, week - so 
today would be in 2016, 50), it seems like a natural fit for 
compaction_window_size would be 7 days, but I'm thinking my calendar-based 
weeks will never align with TWCS 7-day-period weeks anyway - am I missing 
something there?

 

I'd appreciate any other thoughts on compaction and/or twcs.

 

Thanks

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to