There are some issues working on larger partitions. Hbase doesn't do what you say! You have also to be carefull on hbase not to create large rows! But since they are globally-sorted, you can easily sort between them and create small rows.
In my opinion, cassandra people are wrong, in that they say "globally sorted is the devil!" while all fb/google/etc actually use globally-sorted most of the time! You have to be careful though (just like with random partition) Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a way. The most "recent", means there's a timestamp in there ? On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali <k...@peernova.com> wrote: > Hi All, > > I understand Cassandra can have a maximum of 2B rows per partition but in > practice some people seem to suggest the magic number is 100K. why not > create another partition/rowkey automatically (whenever we reach a safe > limit that we consider would be efficient) with auto increment bigint as > a suffix appended to the new rowkey? so that the driver can return the new > rowkey indicating that there is a new partition and so on...Now I > understand this would involve allowing partial row key searches which > currently Cassandra wouldn't do (but I believe HBASE does) and thinking > about token ranges and potentially many other things.. > > My current problem is this > > I have a row key followed by bunch of columns (this is not time series > data) > and these columns can grow to any number so since I have 100K limit (or > whatever the number is. say some limit) I want to break the partition into > level/pages > > rowkey1, page1->col1, col2, col3...... > rowkey1, page2->col1, col2, col3...... > > now say my Cassandra db is populated with data and say my application just > got booted up and I want to most recent value of a certain partition but I > don't know which page it belongs to since my application just got booted > up? how do I solve this in the most efficient that is possible in Cassandra > today? I understand I can create MV, other tables that can hold some > auxiliary data such as number of pages per partition and so on..but that > involves the maintenance cost of that other table which I cannot afford > really because I have MV's, secondary indexes for other good reasons. so it > would be great if someone can explain the best way possible as of today > with Cassandra? By best way I mean is it possible with one request? If Yes, > then how? If not, then what is the next best way to solve this? > > Thanks, > kant >