Thanks again. I swear I'd look this up instead, but my google-fu is failing me completely ... That said, I presume that they're enabled by setting values for tombstone_compaction_interval and tombstone_threshold? Or is there more to it?
On Fri, Dec 16, 2016 at 10:41 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > With the caveat that tombstone compactions are disabled by default in TWCS > (and DTCS) > > -- > Jeff Jirsa > > > On Dec 16, 2016, at 8:34 PM, Voytek Jarnot <voytek.jar...@gmail.com> > wrote: > > Gotcha. "never compacted" has an implicit asterisk referencing > tombstone_compaction_interval and tombstone_threshold, sounds like. More > of a "never compacted" via strategy selection, but eligible for > tombstone-triggered compaction. > > On Fri, Dec 16, 2016 at 10:07 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> > wrote: > >> Tombstone compaction subproperties can handle tombstone removal for you >> (you’ll set a ratio of tombstones worth compacting away – for example, 80%, >> and set an interval to prevent continuous compaction – for example, 24 >> hours, and then anytime there’s no other work to do, if there’s an sstable >> over 24 hours old that’s at least 80% tombstones, it’ll compact it in a >> single sstable compaction). >> >> >> >> - Jeff >> >> >> >> *From: *Voytek Jarnot <voytek.jar...@gmail.com> >> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Date: *Friday, December 16, 2016 at 7:34 PM >> >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *Re: Choosing a compaction strategy (TWCS) >> >> >> >> Thanks again, Jeff. >> >> >> >> Thinking about this some more, I'm wondering if I'm overthinking or if >> there's a potential issue: >> >> >> >> If my compaction_window_size is 7 (DAYS), and I've got TTLs of 7 days on >> some (relatively small percentage) of my records - am I going to be leaving >> tombstones around all over the place? My noob-read on this is that TWCS >> will not compact tables comprised of records older than 7 days ( >> https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dm >> lHowDataMaintain.html#dmlHowDataMaintain__twcs >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_cassandra_3.x_cassandra_dml_dmlHowDataMaintain.html-23dmlHowDataMaintain-5F-5Ftwcs&d=DgMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=L4TzIyjP32pjustWSsxm3_fFNKA2QK84X7oK9lBKhvo&s=De9MdTP7WY7skYPIsIt8ZM5G0cMAquAkSFun7iqCV_g&e=>), >> but Cassandra will not evict my tombstones until 7 days + consideration for >> gc_grace_seconds have passed ... resulting in no tombstone removal (?). >> >> >> >> >> >> >> >> On Fri, Dec 16, 2016 at 1:17 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >> wrote: >> >> The issue is that your partitions will likely be in 2 sstables instead of >> “theoretically” 1. In practice, they’re probably going to bleed into 2 >> anyway (memTable flush to sstable isn’t going to happen exactly when the >> window expires, so it’ll bleed a bit anyway), so I bet no meaningful impact. >> >> >> >> - Jeff >> >> >> >> *From: *Voytek Jarnot <voytek.jar...@gmail.com> >> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Date: *Friday, December 16, 2016 at 11:12 AM >> >> >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *Re: Choosing a compaction strategy (TWCS) >> >> >> >> Thank you Jeff - always nice to hear straight from the source. >> >> >> >> Any issues you can see with 3 (my calendar-week bucket not aligning with >> the arbitrary 7-day window)? Or am I confused (I'd put money on this >> option, but I've been wrong once or twice before)? >> >> >> >> On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >> wrote: >> >> I skipped over the more important question - loading data in. Two >> options: >> >> 1) Load data in order through the normal writepath and use “USING >> TIMESTAMP” to set the timestamp, or >> >> 2) Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables, >> then sstableloader them into the cluster. >> >> >> >> Either way, try not to mix writes of old data and new data in the >> “normal” write path at the same time, even if you write “USING TIMESTAMP”, >> because it’ll get mixed in the memTable, and flushed into the same sstable >> – it won’t kill you, but if you can avoid it, avoid it. >> >> >> >> - Jeff >> >> >> >> >> >> *From: *Jeff Jirsa <jeff.ji...@crowdstrike.com> >> *Date: *Friday, December 16, 2016 at 10:47 AM >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *Re: Choosing a compaction strategy (TWCS) >> >> >> >> With a 10 year retention, just ignore the target sstable count (I should >> remove that guidance, to be honest), and go for a 1 week window to match >> your partition size. 520 sstables on disk isn’t going to hurt you as long >> as you’re not reading from all of them, and with a partition-per-week the >> bloom filter is going to make things nice and easy for you. >> >> >> >> - Jeff >> >> >> >> >> >> *From: *Voytek Jarnot <voytek.jar...@gmail.com> >> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Date: *Friday, December 16, 2016 at 10:37 AM >> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org> >> *Subject: *Choosing a compaction strategy (TWCS) >> >> >> >> Scenario: >> >> Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra >> tables, basically time-series - think log or auditing. Retention is 10 >> years, but greater than 95% of reads will occur on data written within the >> last year. 7 day TTL used on a small percentage of the records, majority do >> not use TTL. Other than the aforementioned TTL, and the 10-year purge, no >> updates or deletes are done. >> >> >> >> Seems like TWCS is the right choice, but I have a few questions/concerns: >> >> >> >> 1) I'll be bulk loading a few years of existing data upon deployment - >> any issues with that? I assume using "with timestamp" when inserting this >> data will be mandatory if I choose TWCS? >> >> >> >> 2) I read here (https://github.com/jeffjirsa/twcs/ >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jeffjirsa_twcs_&d=DgMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=mFIirekKLKHUeQ-Jop1JR4gIXJx8KEQcmtgh15v0Vqo&s=m0O2Z6XGdat-bljOtiuWnVblHHtyJM4TKZ80mhwVBDs&e=>) >> that "You should target fewer than 50 buckets per table based on your TTL." >> That's going to be a tough goal with a 10 year retention ... can anyone >> speak to how important this target really is? >> >> >> >> 3) If I'm bucketing my data with week/year (i.e., partition on year, week >> - so today would be in 2016, 50), it seems like a natural fit for >> compaction_window_size would be 7 days, but I'm thinking my calendar-based >> weeks will never align with TWCS 7-day-period weeks anyway - am I missing >> something there? >> >> >> >> I'd appreciate any other thoughts on compaction and/or twcs. >> >> >> >> Thanks >> >> >> >> >> > >