Scenario:
Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra
tables, basically time-series - think log or auditing.  Retention is 10
years, but greater than 95% of reads will occur on data written within the
last year. 7 day TTL used on a small percentage of the records, majority do
not use TTL. Other than the aforementioned TTL, and the 10-year purge, no
updates or deletes are done.

Seems like TWCS is the right choice, but I have a few questions/concerns:

1) I'll be bulk loading a few years of existing data upon deployment - any
issues with that?  I assume using "with timestamp" when inserting this data
will be mandatory if I choose TWCS?

2) I read here (https://github.com/jeffjirsa/twcs/) that "You should target
fewer than 50 buckets per table based on your TTL." That's going to be a
tough goal with a 10 year retention ... can anyone speak to how important
this target really is?

3) If I'm bucketing my data with week/year (i.e., partition on year, week -
so today would be in 2016, 50), it seems like a natural fit for
compaction_window_size would be 7 days, but I'm thinking my calendar-based
weeks will never align with TWCS 7-day-period weeks anyway - am I missing
something there?

I'd appreciate any other thoughts on compaction and/or twcs.

Thanks

Reply via email to