If you need to parallelize (and scale) you need to distribute across
multiple rows. One Big Row means all your 100 workers are hammering
the same 3 (for instance) replicas at the same time.

On Sun, Jun 5, 2011 at 1:43 PM, Joseph Stein <crypt...@gmail.com> wrote:
> What is the best practices here to page and slice columns from a row.
> So lets say I have 1,000,000 columns in a row
> I read the row but want to have 1 thread read columns 0 - 9999, second
> thread (actor in my case) 10000 - 19999 ... and so on so i can have 100
> workers processing 10,000 columns for each of my rows.
> If there is no API for this then is it something I should a composite key on
> and have to populate the rows with a counter
> 0000000:myoriginalcolumnnameX
> 0000001:myoriginalcolumnnameY
> 0000002:myoriginalcolumnnameZ
> Going the composite key route and doing a start/end predicate would work but
> then it kind of makes the insertion/load of this have to go through a
> single synchronized point to generate the columns names... I am not opposed
> to this but would prefer both the load of my data and processing of my data
> to not be bound by any 1 single lock (even if distributed).
> Thanks!!!!
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop
> */
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Reply via email to