Re: Choosing a compaction strategy (TWCS)

Jeff Jirsa Fri, 16 Dec 2016 20:41:32 -0800

With the caveat that tombstone compactions are disabled by default in TWCS (and 
DTCS)


-- 
Jeff Jirsa


> On Dec 16, 2016, at 8:34 PM, Voytek Jarnot <voytek.jar...@gmail.com> wrote:
> 
> Gotcha.  "never compacted" has an implicit asterisk referencing 
> tombstone_compaction_interval and tombstone_threshold, sounds like.  More of 
> a "never compacted" via strategy selection, but eligible for 
> tombstone-triggered compaction.
> 
>> On Fri, Dec 16, 2016 at 10:07 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> 
>> wrote:
>> Tombstone compaction subproperties can handle tombstone removal for you 
>> (you’ll set a ratio of tombstones worth compacting away – for example, 80%, 
>> and set an interval to prevent continuous compaction – for example, 24 
>> hours, and then anytime there’s no other work to do, if there’s an sstable 
>> over 24 hours old that’s at least 80% tombstones, it’ll compact it in a 
>> single sstable compaction).
>> 
>>  
>> 
>> -          Jeff
>> 
>>  
>> 
>> From: Voytek Jarnot <voytek.jar...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Friday, December 16, 2016 at 7:34 PM
>> 
>> 
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Choosing a compaction strategy (TWCS)
>>  
>> 
>> Thanks again, Jeff.
>> 
>>  
>> 
>> Thinking about this some more, I'm wondering if I'm overthinking or if 
>> there's a potential issue:
>> 
>>  
>> 
>> If my compaction_window_size is 7 (DAYS), and I've got TTLs of 7 days on 
>> some (relatively small percentage) of my records - am I going to be leaving 
>> tombstones around all over the place?  My noob-read on this is that TWCS 
>> will not compact tables comprised of records older than 7 days 
>> (https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlHowDataMaintain.html#dmlHowDataMaintain__twcs),
>>  but Cassandra will not evict my tombstones until 7 days + consideration for 
>> gc_grace_seconds have passed ... resulting in no tombstone removal (?).
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>> On Fri, Dec 16, 2016 at 1:17 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> 
>> wrote:
>> 
>> The issue is that your partitions will likely be in 2 sstables instead of 
>> “theoretically” 1. In practice, they’re probably going to bleed into 2 
>> anyway (memTable flush to sstable isn’t going to happen exactly when the 
>> window expires, so it’ll bleed a bit anyway), so I bet no meaningful impact.
>> 
>>  
>> 
>> -          Jeff
>> 
>>  
>> 
>> From: Voytek Jarnot <voytek.jar...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Friday, December 16, 2016 at 11:12 AM
>> 
>> 
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Choosing a compaction strategy (TWCS)
>> 
>>  
>> 
>> Thank you Jeff - always nice to hear straight from the source.
>> 
>>  
>> 
>> Any issues you can see with 3 (my calendar-week bucket not aligning with the 
>> arbitrary 7-day window)? Or am I confused (I'd put money on this option, but 
>> I've been wrong once or twice before)?
>> 
>>  
>> 
>> On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> 
>> wrote:
>> 
>> I skipped over the more important question  - loading data in. Two options:
>> 
>> 1)       Load data in order through the normal writepath and use “USING 
>> TIMESTAMP” to set the timestamp, or
>> 
>> 2)       Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables, then 
>> sstableloader them into the cluster.
>> 
>>  
>> 
>> Either way, try not to mix writes of old data and new data in the “normal” 
>> write path  at the same time, even if you write “USING TIMESTAMP”, because 
>> it’ll get mixed in the memTable, and flushed into the same sstable – it 
>> won’t kill you, but if you can avoid it, avoid it.
>> 
>>  
>> 
>> -                      Jeff
>> 
>>  
>> 
>>  
>> 
>> From: Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> Date: Friday, December 16, 2016 at 10:47 AM
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Choosing a compaction strategy (TWCS)
>> 
>>  
>> 
>> With a 10 year retention, just ignore the target sstable count (I should 
>> remove that guidance, to be honest), and go for a 1 week window to match 
>> your partition size. 520 sstables on disk isn’t going to hurt you as long as 
>> you’re not reading from all of them, and with a partition-per-week the bloom 
>> filter is going to make things nice and easy for you.
>> 
>>  
>> 
>> -          Jeff
>> 
>>  
>> 
>>  
>> 
>> From: Voytek Jarnot <voytek.jar...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Date: Friday, December 16, 2016 at 10:37 AM
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Choosing a compaction strategy (TWCS)
>> 
>>  
>> 
>> Scenario:
>> 
>> Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra 
>> tables, basically time-series - think log or auditing.  Retention is 10 
>> years, but greater than 95% of reads will occur on data written within the 
>> last year. 7 day TTL used on a small percentage of the records, majority do 
>> not use TTL. Other than the aforementioned TTL, and the 10-year purge, no 
>> updates or deletes are done.
>> 
>>  
>> 
>> Seems like TWCS is the right choice, but I have a few questions/concerns:
>> 
>>  
>> 
>> 1) I'll be bulk loading a few years of existing data upon deployment - any 
>> issues with that?  I assume using "with timestamp" when inserting this data 
>> will be mandatory if I choose TWCS?
>> 
>>  
>> 
>> 2) I read here (https://github.com/jeffjirsa/twcs/) that "You should target 
>> fewer than 50 buckets per table based on your TTL." That's going to be a 
>> tough goal with a 10 year retention ... can anyone speak to how important 
>> this target really is?
>> 
>>  
>> 
>> 3) If I'm bucketing my data with week/year (i.e., partition on year, week - 
>> so today would be in 2016, 50), it seems like a natural fit for 
>> compaction_window_size would be 7 days, but I'm thinking my calendar-based 
>> weeks will never align with TWCS 7-day-period weeks anyway - am I missing 
>> something there?
>> 
>>  
>> 
>> I'd appreciate any other thoughts on compaction and/or twcs.
>> 
>>  
>> 
>> Thanks
>> 
>>  
>> 
>>  
>> 
>

smime.p7s
Description: S/MIME cryptographic signature

Re: Choosing a compaction strategy (TWCS)

Reply via email to