If you need to parallelize (and scale) you need to distribute across multiple rows. One Big Row means all your 100 workers are hammering the same 3 (for instance) replicas at the same time.
On Sun, Jun 5, 2011 at 1:43 PM, Joseph Stein <crypt...@gmail.com> wrote: > What is the best practices here to page and slice columns from a row. > So lets say I have 1,000,000 columns in a row > I read the row but want to have 1 thread read columns 0 - 9999, second > thread (actor in my case) 10000 - 19999 ... and so on so i can have 100 > workers processing 10,000 columns for each of my rows. > If there is no API for this then is it something I should a composite key on > and have to populate the rows with a counter > 0000000:myoriginalcolumnnameX > 0000001:myoriginalcolumnnameY > 0000002:myoriginalcolumnnameZ > Going the composite key route and doing a start/end predicate would work but > then it kind of makes the insertion/load of this have to go through a > single synchronized point to generate the columns names... I am not opposed > to this but would prefer both the load of my data and processing of my data > to not be bound by any 1 single lock (even if distributed). > Thanks!!!! > /* > Joe Stein > http://www.linkedin.com/in/charmalloc > Twitter: @allthingshadoop > */ > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com