Distributed work-queues?

Andrew Miklas Sat, 26 Jun 2010 13:57:25 -0700

Hi all,

Has anyone written a work-queue implementation using Cassandra?

There's a section in the UseCase wiki page for "A distributed PriorityJob Queue" which looks perfect, but unfortunately it hasn't beenfilled in yet.

http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue

I've been thinking about how best to do this, but every solution I'vethought of seems to have some serious drawback. The "range ghost"problem in particular creates some issues. I'm assuming each job hasa row within some column family, where the row's key is the time atwhich the job should be run. To find the next job, you'd do a rangequery with a start a few hours in the past, and an end at the currenttime. Once a job is completed, you delete the row.

The problem here is that you have to scan through deleted-but-not-yet-GCed rows each time you run the query. Is there a better way?

Preventing more than one worker from starting the same job seems likeit would be a problem too. You'd either need an external lockingmanager, or have to use some other protocol where workers write theirID into the row and then immediately read it back to confirm that theyare the owner of the job.

Any ideas here? Has anyone come up with a nice implementation? IsCassandra not well suited for queue-like tasks?




Thanks,


Andrew

Distributed work-queues?

Reply via email to