I would try to avoid 100's on MB's per row. It will take longer to compact and repair.
10's is fine. Take a look at in_memory_compaction_limit and thrift_frame_size in the yaml file for some guidance. Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 2/05/2012, at 6:00 AM, Aaron Turner wrote: > On Tue, May 1, 2012 at 10:20 AM, Tim Wintle <timwin...@gmail.com> wrote: >> I believe that the general design for time-series schemas looks >> something like this (correct me if I'm wrong): >> >> (storing time series for X dimensions for Y different users) >> >> Row Keys: "{USET_ID}_{TIMESTAMP/BUCKETSIZE}" >> Columns: "{DIMENSION_ID}_{TIMESTAMP%BUCKETSIZE}" -> {Counter} >> >> But I've not found much advice on calculating optimal bucket sizes (i.e. >> optimal number of columns per row), and how that decision might be >> affected by compression (or how significant the performance differences >> between the two options might be). >> >> Are the calculations here are still considered valid (proportionally) in >> 1.X, with the changes to SSTables, or is it significantly different? >> >> <http://btoddb-cass-storage.blogspot.co.uk/2011/07/column-overhead-and-sizing-every-column.html> > > > Tens or a few hundred MB per row seems reasonable. You could do > thousands/MB if you wanted to, but that can make things harder to > manage. > > Depending on the size of your data, you may find that the overhead of > each column becomes significant; far more then the per-row overhead. > Since all of my data is just 64bit integers, I ended up taking a days > worth of values (288/day @ 5min intervals) and storing it as a single > column as a vector. Hence I have two CF's: > > StatsDaily -- each row == 1 day, each column = 1 stat @ 5min intervals > StatsDailyVector -- each row == 1 year, each column = 288 stats @ 1 > day intervals > > Every night a job kicks off and converts each row's worth of > StatsDaily into a column in StatsDailyVector. By doing it 1:1 this > way, I also reduce the number of tombstones I need to write in > StatsDaily since I only need one tombstone for the row delete, rather > then 288 for each column deleted. > > I don't use compression. > > > > -- > Aaron Turner > http://synfin.net/ Twitter: @synfinatic > http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & > Windows > Those who would give up essential Liberty, to purchase a little temporary > Safety, deserve neither Liberty nor Safety. > -- Benjamin Franklin > "carpe diem quam minimum credula postero"