Hi All,

Can someone please validate and recommend a solution for the given design
problem?

*Problem statement:* Need to de-queue data from Cassandra (from Standard
ColumnFamily) using a job but multiple instances of a job can run
simultaneously (kinda multiple threads), trying to access a same row but
need to make sure that only one instance of a job (thread) can access a row,
meaning if job A is accessing Row #1, then job B can't access Row #1.

*Possible solutions:*

*Solution #1:* Using Cages (and ZooKeeper) to make sure that one only job at
a time can access a row in CF. How do we make sure that Cages (transaction
coordinator using ZooKeeper) is not a Single Point of Failure? What is the
performance impact on write/read on nodes? There is some blog on distributed
concurrent queue at
http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/

*Solution #2: *Using some home-grown approach to store/maintain who is
accessing what, meaning which job is accessing which row.

Are there any other solutions to the above problem?

Can someone please help me on validate the design?

-- 
Thanks,
Mubarak Seyed.

Reply via email to