With a 10 year retention, just ignore the target sstable count (I should remove that guidance, to be honest), and go for a 1 week window to match your partition size. 520 sstables on disk isn’t going to hurt you as long as you’re not reading from all of them, and with a partition-per-week the bloom filter is going to make things nice and easy for you.
- Jeff From: Voytek Jarnot <voytek.jar...@gmail.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Friday, December 16, 2016 at 10:37 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Choosing a compaction strategy (TWCS) Scenario: Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra tables, basically time-series - think log or auditing. Retention is 10 years, but greater than 95% of reads will occur on data written within the last year. 7 day TTL used on a small percentage of the records, majority do not use TTL. Other than the aforementioned TTL, and the 10-year purge, no updates or deletes are done. Seems like TWCS is the right choice, but I have a few questions/concerns: 1) I'll be bulk loading a few years of existing data upon deployment - any issues with that? I assume using "with timestamp" when inserting this data will be mandatory if I choose TWCS? 2) I read here (https://github.com/jeffjirsa/twcs/) that "You should target fewer than 50 buckets per table based on your TTL." That's going to be a tough goal with a 10 year retention ... can anyone speak to how important this target really is? 3) If I'm bucketing my data with week/year (i.e., partition on year, week - so today would be in 2016, 50), it seems like a natural fit for compaction_window_size would be 7 days, but I'm thinking my calendar-based weeks will never align with TWCS 7-day-period weeks anyway - am I missing something there? I'd appreciate any other thoughts on compaction and/or twcs. Thanks
smime.p7s
Description: S/MIME cryptographic signature