@Jan This subject of distributed workers & queues has been discussed in the mailing list many times. Basically one implementation can be:
1) *p* data providers, *c* data consumers 2) create partitions (physical rows) of arbitrary number of columns (let's say 10 000, not too big though). Partition key = bucket number (*#b*) 3) assign an integer id (*pId*) to each provider, same for each consumer ( *cId*) 4) each provider can only write messages in bucket number such that *#b mod p = pId mod p* 5) once the provider reaches 10 000 messages per bucket, it switches to the next one with *new #b = old #b + p* 6) the consumers follow the same rule for bucket switching Example: p = 5, c = 3 - p1 writes messages into buckets {1,6,11,16...} // 1, 1+5, 1+5+5, .... - p2 writes messages into buckets {2,7,12,17...} // 2, 2+5, 2+5+5,... - p3 writes messages into buckets {3,8,13,18...} - p4 writes messages into buckets {4,9,14,19...} - p5 writes messages into buckets {5,10,15,20...} - c1 consumes messages from buckets {1,4,7,10...} // 1, 1+3, 1+3+3... - c2 consumes messages from buckets {2,5,8,11...} - c1 consumes messages from buckets {3,6,9,12...} Of course, consumers can not re-put messages into the bucket otherwise the counting (10 000 elements/bucket) is screwed Alternatively, you can insert messages with TTL to automatically expired "consumed buckets" after a while, saving you the hassle to clean up old buckets to reclaim disk space. There are other implementations based on distributed lock using C* C.A.S also but the above algorithm do not requires any lock. Regards Duy Hai DOAN On Fri, Apr 4, 2014 at 12:47 PM, prem yadav <ipremya...@gmail.com> wrote: > Oh ok. I thought you did not have a cassandra cluster already. Sorry about > that. > > > On Fri, Apr 4, 2014 at 11:42 AM, Jan Algermissen < > jan.algermis...@nordsc.com> wrote: > >> >> On 04 Apr 2014, at 11:18, prem yadav <ipremya...@gmail.com> wrote: >> >> Though cassandra can work but to me it looks like you could use a >> persistent queue for example (rabbitMQ) to implement this. All your workers >> can subscribe to a queue. >> In fact, why not just MySQL? >> >> >> Hey, I have got a C* cluster that can (potentially) do CAS. >> >> Why would I set up a MySQL cluster to solve that problem? >> >> And yeah, I could use a queue or redis or whatnot, but I want to avoid >> yet another moving part :-) >> >> Jan >> >> >> >> >> On Thu, Apr 3, 2014 at 11:44 PM, Jan Algermissen < >> jan.algermis...@nordsc.com> wrote: >> >>> Hi, >>> >>> maybe someone knows a nice solution to the following problem: >>> >>> I have N worker processes that are intentionally masterless and do not >>> know about each other - they are stateless and independent instances of a >>> given service system. >>> >>> These workers need to poll an event feed, say about every 10 seconds and >>> persist a state after processing the polled events so the next worker knows >>> where to continue processing events. >>> >>> I would like to use C*'s CAS feature to coordinate the workers and >>> protect the shared state (a row or cell in a C* key space, too). >>> >>> Has anybody done something similar and can suggest a 'clever' data model >>> design and interaction? >>> >>> >>> >>> Jan >> >> >> >> >