I'd avoid using super columns. I don't believe they're recommended anymore, and with CQL3 they aren't even supported (if you're interested in going that route). I think it's unlikely that you'll want a column family per company either.
How many "ticker" entries do you plan on writing per company? You've got a lot of elipses in there as well, which makes me wonder what other data you're looking to store. To take a guess, I'd wager you'd be looking for a trades table, and another table that tracks the closing price per day. In the trades table, something along the lines of this CQL3 definition might be helpful: create table trades ( company text, ts timeuuid, price decimal, primary key(company, ts) ); This would give you a single row in the traditional Cassandra sense, and it would be ordered by the timestamp you supply. You can use a timeuuid to avoid the duplicate timestamp problem. This is about as far as I can go without knowing more about what you're actually trying to do... I think it's going to be difficult for anyone to give you helpful advice unless you can elaborate a bit on what your requirements are. Jon On Thu, Oct 17, 2013 at 10:51 PM, Rajith Siriwardana < rajithsiriward...@gmail.com> wrote: > Hi all, > > I have a problem like this, > > I have stock transaction data, as follows. > Ticker data: > Company name: > timestamp: > closing price (N): (V) > trades (N) : (V) > ...... > ..... > ...... > > In my model : I want to execute range queries on timestamps, (sorted > order) > > approaches currently have in mind, > > 1. I can have ticker data : columnfamily, company name : rowkey, > timestamp: super column, and other attributes as columns. In this way > there will be around *100 rowkeys*, around *1M timestamps*, around *10 > columns under one super column.* > Problems > > - Cassandra best practices are to use the RandomPartitioner - this > gives you 'free' load balancing, as long as your tokens are evenly > distributed. so the load balancing would happen on 100 row keys. is this > acceptable approach? > - and there is a possibility to have duplicates in timestamps. that > will be a problem. > > > 2. I can have ticker data : keyspace, company name : column family, > timestamp: row key, and other attributes as columns. In this way there > will be around *100 column families*, around *1M row keys*, around *10 > columns per one row.* > > Problems > > - In this way, range queries are not in sorted order. > - and I guess there is also duplicate row key problem > > Any suggestions how I can overcome this? > > Cheers, > Rajith > > > > -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade