Hi all.

This may be trivial, but I cannot find good references for it. The problem is 
this:

Suppose we have one table in PgSQL which is a job queue, each row represents 
one job with several status flags, IDs,... Several processes will attempt to 
access the queue and "take" their batch of jobs, the batch will have some 
parameterizable size. So, the simple idea is "select N lowest IDs that do not 
have a flag <in process> set and set the flag", "then proceed with whatever it 
is that should be done".

Trouble is, with MVCC I don't see a way to prevent overlapping and race 
conditions. Oh, sure, if I issue select for update, it will lock rows, but, if 
I understand correctly, the change may not be instantaneous and atomic, so I 
might get transaction to roll back and then there is error handling that will 
lead to the uglies serialization I can think of. Let me clarify this, so 
somebody can tell me if I got it wrong.

Imagine Queue table with 20 rows, ID: 1,...,20, status="new". Imagine 2 
processes/threads (P1, P2) attempting to get 10 jobs each.How to do that?

P1: UPDATE job_queue SET process_id=$1, status="in process" WHERE id IN (
    SELECT id FROM job_queue WHERE status="new" and id IN (   
        SELECT id FROM job_queue WHERE status="new" ORDER BY id LIMIT 10 FOR 
UPDATE)
    )
)
P2: the same
P1: SELECT * FROM job_queue WHERE process_id=$1 ....
P2: SELECT * FROM job_queue WHERE process_id=$1 ....

The reason for the 2 selects is that if 2 or more processes content for the 
same set of jobs, the first one will set the status. The second will, after P1 
has released the rows get those rows, that are already taken. Of course, this 
will most likely return 0 rows for P2, since all 10 will be taken. If I leave 
out the LIMIT 10 in the inner select, I am effectively locking the entire 
table. Is that the way to go?

LOCK TABLE job_queue EXCLUSIVE;
UPDATE ...
UNLOCK TABLE job_queue;

Nix.



      

Reply via email to