Thanks again, Jeff.

Thinking about this some more, I'm wondering if I'm overthinking or if
there's a potential issue:

If my compaction_window_size is 7 (DAYS), and I've got TTLs of 7 days on
some (relatively small percentage) of my records - am I going to be leaving
tombstones around all over the place?  My noob-read on this is that TWCS
will not compact tables comprised of records older than 7 days (
https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlHowDataMaintain.html#dmlHowDataMaintain__twcs),
but Cassandra will not evict my tombstones until 7 days + consideration for
gc_grace_seconds have passed ... resulting in no tombstone removal (?).



On Fri, Dec 16, 2016 at 1:17 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

> The issue is that your partitions will likely be in 2 sstables instead of
> “theoretically” 1. In practice, they’re probably going to bleed into 2
> anyway (memTable flush to sstable isn’t going to happen exactly when the
> window expires, so it’ll bleed a bit anyway), so I bet no meaningful impact.
>
>
>
> -          Jeff
>
>
>
> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, December 16, 2016 at 11:12 AM
>
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Choosing a compaction strategy (TWCS)
>
>
>
> Thank you Jeff - always nice to hear straight from the source.
>
>
>
> Any issues you can see with 3 (my calendar-week bucket not aligning with
> the arbitrary 7-day window)? Or am I confused (I'd put money on this
> option, but I've been wrong once or twice before)?
>
>
>
> On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
> wrote:
>
> I skipped over the more important question  - loading data in. Two options:
>
> 1)       Load data in order through the normal writepath and use “USING
> TIMESTAMP” to set the timestamp, or
>
> 2)       Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables,
> then sstableloader them into the cluster.
>
>
>
> Either way, try not to mix writes of old data and new data in the “normal”
> write path  at the same time, even if you write “USING TIMESTAMP”, because
> it’ll get mixed in the memTable, and flushed into the same sstable – it
> won’t kill you, but if you can avoid it, avoid it.
>
>
>
> -                      Jeff
>
>
>
>
>
> *From: *Jeff Jirsa <jeff.ji...@crowdstrike.com>
> *Date: *Friday, December 16, 2016 at 10:47 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: Choosing a compaction strategy (TWCS)
>
>
>
> With a 10 year retention, just ignore the target sstable count (I should
> remove that guidance, to be honest), and go for a 1 week window to match
> your partition size. 520 sstables on disk isn’t going to hurt you as long
> as you’re not reading from all of them, and with a partition-per-week the
> bloom filter is going to make things nice and easy for you.
>
>
>
> -          Jeff
>
>
>
>
>
> *From: *Voytek Jarnot <voytek.jar...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Friday, December 16, 2016 at 10:37 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Choosing a compaction strategy (TWCS)
>
>
>
> Scenario:
>
> Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra
> tables, basically time-series - think log or auditing.  Retention is 10
> years, but greater than 95% of reads will occur on data written within the
> last year. 7 day TTL used on a small percentage of the records, majority do
> not use TTL. Other than the aforementioned TTL, and the 10-year purge, no
> updates or deletes are done.
>
>
>
> Seems like TWCS is the right choice, but I have a few questions/concerns:
>
>
>
> 1) I'll be bulk loading a few years of existing data upon deployment - any
> issues with that?  I assume using "with timestamp" when inserting this data
> will be mandatory if I choose TWCS?
>
>
>
> 2) I read here (https://github.com/jeffjirsa/twcs/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jeffjirsa_twcs_&d=DgMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=mFIirekKLKHUeQ-Jop1JR4gIXJx8KEQcmtgh15v0Vqo&s=m0O2Z6XGdat-bljOtiuWnVblHHtyJM4TKZ80mhwVBDs&e=>)
> that "You should target fewer than 50 buckets per table based on your TTL."
> That's going to be a tough goal with a 10 year retention ... can anyone
> speak to how important this target really is?
>
>
>
> 3) If I'm bucketing my data with week/year (i.e., partition on year, week
> - so today would be in 2016, 50), it seems like a natural fit for
> compaction_window_size would be 7 days, but I'm thinking my calendar-based
> weeks will never align with TWCS 7-day-period weeks anyway - am I missing
> something there?
>
>
>
> I'd appreciate any other thoughts on compaction and/or twcs.
>
>
>
> Thanks
>
>
>

Reply via email to