The issue is that your partitions will likely be in 2 sstables instead of 
“theoretically” 1. In practice, they’re probably going to bleed into 2 anyway 
(memTable flush to sstable isn’t going to happen exactly when the window 
expires, so it’ll bleed a bit anyway), so I bet no meaningful impact.

 

-          Jeff

 

From: Voytek Jarnot <voytek.jar...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, December 16, 2016 at 11:12 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Choosing a compaction strategy (TWCS)

 

Thank you Jeff - always nice to hear straight from the source. 

 

Any issues you can see with 3 (my calendar-week bucket not aligning with the 
arbitrary 7-day window)? Or am I confused (I'd put money on this option, but 
I've been wrong once or twice before)?

 

On Fri, Dec 16, 2016 at 12:50 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:

I skipped over the more important question  - loading data in. Two options:

1)       Load data in order through the normal writepath and use “USING 
TIMESTAMP” to set the timestamp, or

2)       Use CQLSSTableWriter and “USING TIMESTAMP” to create sstables, then 
sstableloader them into the cluster.

 

Either way, try not to mix writes of old data and new data in the “normal” 
write path  at the same time, even if you write “USING TIMESTAMP”, because 
it’ll get mixed in the memTable, and flushed into the same sstable – it won’t 
kill you, but if you can avoid it, avoid it.

 

-                      Jeff

 

 

From: Jeff Jirsa <jeff.ji...@crowdstrike.com>
Date: Friday, December 16, 2016 at 10:47 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Choosing a compaction strategy (TWCS)

 

With a 10 year retention, just ignore the target sstable count (I should remove 
that guidance, to be honest), and go for a 1 week window to match your 
partition size. 520 sstables on disk isn’t going to hurt you as long as you’re 
not reading from all of them, and with a partition-per-week the bloom filter is 
going to make things nice and easy for you.

 

-          Jeff

 

 

From: Voytek Jarnot <voytek.jar...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, December 16, 2016 at 10:37 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Choosing a compaction strategy (TWCS)

 

Scenario: 

Converting an Oracle table to Cassandra, one Oracle table to 4 Cassandra 
tables, basically time-series - think log or auditing.  Retention is 10 years, 
but greater than 95% of reads will occur on data written within the last year. 
7 day TTL used on a small percentage of the records, majority do not use TTL. 
Other than the aforementioned TTL, and the 10-year purge, no updates or deletes 
are done.

 

Seems like TWCS is the right choice, but I have a few questions/concerns:

 

1) I'll be bulk loading a few years of existing data upon deployment - any 
issues with that?  I assume using "with timestamp" when inserting this data 
will be mandatory if I choose TWCS?

 

2) I read here (https://github.com/jeffjirsa/twcs/) that "You should target 
fewer than 50 buckets per table based on your TTL." That's going to be a tough 
goal with a 10 year retention ... can anyone speak to how important this target 
really is?

 

3) If I'm bucketing my data with week/year (i.e., partition on year, week - so 
today would be in 2016, 50), it seems like a natural fit for 
compaction_window_size would be 7 days, but I'm thinking my calendar-based 
weeks will never align with TWCS 7-day-period weeks anyway - am I missing 
something there?

 

I'd appreciate any other thoughts on compaction and/or twcs.

 

Thanks

 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to