Hi all, I have a problem like this,
I have stock transaction data, as follows. Ticker data: Company name: timestamp: closing price (N): (V) trades (N) : (V) ...... ..... ...... In my model : I want to execute range queries on timestamps, (sorted order) approaches currently have in mind, 1. I can have ticker data : columnfamily, company name : rowkey, timestamp: super column, and other attributes as columns. In this way there will be around *100 rowkeys*, around *1M timestamps*, around *10 columns under one super column.* Problems - Cassandra best practices are to use the RandomPartitioner - this gives you 'free' load balancing, as long as your tokens are evenly distributed. so the load balancing would happen on 100 row keys. is this acceptable approach? - and there is a possibility to have duplicates in timestamps. that will be a problem. 2. I can have ticker data : keyspace, company name : column family, timestamp: row key, and other attributes as columns. In this way there will be around *100 column families*, around *1M row keys*, around *10 columns per one row.* Problems - In this way, range queries are not in sorted order. - and I guess there is also duplicate row key problem Any suggestions how I can overcome this? Cheers, Rajith