Hi all,

Has anyone written a work-queue implementation using Cassandra?

There's a section in the UseCase wiki page for "A distributed Priority Job Queue" which looks perfect, but unfortunately it hasn't been filled in yet.
http://wiki.apache.org/cassandra/UseCases#A_distributed_Priority_Job_Queue

I've been thinking about how best to do this, but every solution I've thought of seems to have some serious drawback. The "range ghost" problem in particular creates some issues. I'm assuming each job has a row within some column family, where the row's key is the time at which the job should be run. To find the next job, you'd do a range query with a start a few hours in the past, and an end at the current time. Once a job is completed, you delete the row.

The problem here is that you have to scan through deleted-but-not-yet- GCed rows each time you run the query. Is there a better way?

Preventing more than one worker from starting the same job seems like it would be a problem too. You'd either need an external locking manager, or have to use some other protocol where workers write their ID into the row and then immediately read it back to confirm that they are the owner of the job.

Any ideas here? Has anyone come up with a nice implementation? Is Cassandra not well suited for queue-like tasks?



Thanks,


Andrew

Reply via email to