Thanx Gary. I was thinking of using range partitioning for breaking the input. Say, we could have different threads handling diffierent rages - (A-J) by thread1, (K-P) by thread2. This way, there won't probably be any chance of collision. But the thread which actually performs the distribution could prove to be a bottleneck.
Am I correct in my thinking? Regards Arijit On 27 October 2010 18:49, Gary Dusbabek <gdusba...@gmail.com> wrote: > On Wed, Oct 27, 2010 at 03:24, Arijit Mukherjee <ariji...@gmail.com> wrote: >> Hi All >> >> I've another related question. >> >> I am using a stream of records of the form (A, B, n) where the pair >> (A,B) can occur multiple times. For example, you could have the >> following rset of records - >> >> A, B, 2 >> P, Q, 5 >> X, Y, 3 >> A, B, 8 >> A, B, 2 >> ... >> >> >> The data store has a set of columns - (key, count, sum). Because of >> the possibility of duplicate A and B, I am using the string A+B as my >> key. Every time there is a duplicate A+B, I update a count field, and >> add "n" to the existing value of sum. So, for the above set of >> records, cassandra should actually hold the following set - >> >> A+B, 3, 12 >> P+Q, 1, 5 >> X+Y, 1, 3 >> ... > > You want a distributed counter. > >> >> My question is - is it possible to have multiple threads reading >> different streams so that I can parallelize the insertion mechanism? >> What may happen if two threads try to insert two different records >> with the same A+B key? >> > > No, this isn't going to work. At some point Cassandra will have > distributed counters, probably with a few caveats. See > https://issues.apache.org/jira/browse/CASSANDRA-1546 and related > tickets for more information. > > The best approach I can suggest at this point is to continue inserting > the increments as column names and then manually sum them up when you > need to. There are several approaches you could take if you're > interested in consolidating slices of the increments that would be > reasonably safe against the possibility of concurrent updates. > > Gary. > -- "And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be."