Thanks for the answer. It provides me the insight I'm looking for.

However, I'm also a bit confused as your first paragraph seems to indicate that 
using a SCF is better, whereas the last sentence states just the opposite. Do I 
interpret correctly that this is because of the compactions that put all 
non-volatile data together in one sstable, leading to compact sstable if the 
non-volatile data is put into a separate CF? Can this then be generalised into 
a rule of thumb to separate non-volatile data from volatile data into separate 
CFs, or am I going too far then?

I will definitely be trying out both suggestions and post my findings.

Hugo.

Subject: Re: Super CF or two CFs?
From: aa...@thelastpickle.com
Date: Tue, 18 Jan 2011 21:54:25 +1300
To: user@cassandra.apache.org

With regard to overwrites, and assuming you always want to get all the data for 
a stock ticker. Any read on the volatile data will potentially touch many 
sstables, this IO is unavoidable to read this data so we may as well read as 
many cols as possible at this time. Whereas if you split the data into two cf's 
you would incure all the IO for the volatile data plus IO for the non volatile, 
and have to make two calls. (Or use different keys and make a multiget_slice 
call, the IO argument still stands)
Thanks to compaction less volatile data, say cols that are written once a day, 
week or month, will be tend to accrete into fewer sstables. To that end it may 
make sense to schedule compactions to run after weekly bulk operations. Also 
take a look at the per CF compaction thresholds.
I'd recommend trying one standard CF (with the quotes packed as suggested) to 
start with, run some tests and let us know how you go. There are some small 
penalties to using super Cfs, see the limitations page on the wiki.
Hope that helps.Aaron


On 18/01/2011, at 9:29 PM, Steven Mac <ugs...@hotmail.com> wrote:


Some of the fields are indeed written in one shot, but others (such as label 
and categories) are added later, so I think the question still stands.

Hugo.

From: dri...@gmail.com
Date: Mon, 17 Jan 2011 18:47:28 -0600
Subject: Re: Super CF or two CFs?
To: user@cassandra.apache.org

On Mon, Jan 17, 2011 at 5:12 PM, Steven Mac <ugs...@hotmail.com> wrote:







I guess I was maybe trying to simplify the question too much. In reality I do 
not have one volatile part, but multiple ones (say all trading data of day). 
Each would be a supercolumn identified by the time slot, with the individual 
fields as subcolumns.



If you're always going to write these attributes in one shot, then just 
serialize them and use a simple CF, there's no need for a SCF.
-Brandon

                                          
                                          

Reply via email to