The issue is that your partitions will likely be in 2 sstables instead of “theoretically” 1. In practice, they’re probably going to bleed into 2 anyway (memTable flush to sstable isn’t going to happen exactly when the window expires, so it’ll bleed a bit anyway), so I bet no meaningful impact.
- Jeff From: Voytek Jarnot <voytek.jar...@gmail.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Friday, December 16, 2016 at 11:12 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: Choosing a compaction strategy (TWCS) Thank you Jeff - always nice to hear straight from the source. Any issues you can see with 3 (my calendar-week bucket not aligning with the arbitrary 7-day window)? Or am I confused (I'd put money on this option, but I've been wrong once or twice before)? On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: I skipped over the more important question - loading data in. Two options: 1) Load data in order through the normal writepath and use “USING TIMESTAMP” to set the timestamp, or 2) Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables, then sstableloader them into the cluster. Either way, try not to mix writes of old data and new data in the “normal” write path at the same time, even if you write “USING TIMESTAMP”, because it’ll get mixed in the memTable, and flushed into the same sstable – it won’t kill you, but if you can avoid it, avoid it. - Jeff From: Jeff Jirsa <jeff.ji...@crowdstrike.com> Date: Friday, December 16, 2016 at 10:47 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: Choosing a compaction strategy (TWCS) With a 10 year retention, just ignore the target sstable count (I should remove that guidance, to be honest), and go for a 1 week window to match your partition size. 520 sstables on disk isn’t going to hurt you as long as you’re not reading from all of them, and with a partition-per-week the bloom filter is going to make things nice and easy for you. - Jeff From: Voytek Jarnot <voytek.jar...@gmail.com> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Friday, December 16, 2016 at 10:37 AM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Choosing a compaction strategy (TWCS) Scenario: Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra tables, basically time-series - think log or auditing. Retention is 10 years, but greater than 95% of reads will occur on data written within the last year. 7 day TTL used on a small percentage of the records, majority do not use TTL. Other than the aforementioned TTL, and the 10-year purge, no updates or deletes are done. Seems like TWCS is the right choice, but I have a few questions/concerns: 1) I'll be bulk loading a few years of existing data upon deployment - any issues with that? I assume using "with timestamp" when inserting this data will be mandatory if I choose TWCS? 2) I read here (https://github.com/jeffjirsa/twcs/) that "You should target fewer than 50 buckets per table based on your TTL." That's going to be a tough goal with a 10 year retention ... can anyone speak to how important this target really is? 3) If I'm bucketing my data with week/year (i.e., partition on year, week - so today would be in 2016, 50), it seems like a natural fit for compaction_window_size would be 7 days, but I'm thinking my calendar-based weeks will never align with TWCS 7-day-period weeks anyway - am I missing something there? I'd appreciate any other thoughts on compaction and/or twcs. Thanks
smime.p7s
Description: S/MIME cryptographic signature