Hi all,

I'm seeing some unexpected behaviour with SELECT ... FOR UPDATE SKIP LOCKED, 
and having finding it tricky to boil it down to a simple repro case as there's 
almost certainly a race condition somewhere (more later). So I thought I would 
ask if what I'm doing is unsupported (or just plain wrong!), before expending 
more effort in reproducing it.

I'm running v11.5, RHEL 7.7.

I have two tables jobs and results:
CREATE TABLE job (id integer PRIMARY KEY);
CREATE TABLE result (id integer PRIMARY KEY);
(obviously the real tables have more columns, but that's not too important here)

Something populates the job table with IDs. 
A job is done if its id appears in the result table. 
I would like to have multiple worker processes working on jobs.


I thought I could achieve this with each working doing the following:

BEGIN;

SELECT id 
FROM job
WHERE NOT EXISTS (SELECT 1 FROM result WHERE result.id = job.id)
LIMIT 1
FOR UPDATE SKIP LOCKED;

-- worker process does some work for the selected ID here

INSERT INTO result (id) VALUES (the_id_from_above);

COMMIT;


However, even with just two worker processes, I quickly found that one worker 
process would be assigned a job id that had *very* recently been completed by 
the other worker.

Some more potentially useful information:
* The LockRows node of the plan for the SELECT query above doesn't receive any 
tuples until about a second after the query begins executing
* If worker 2 begins querying for a new job id half a second before worker 1 
commits then worker 2 will pick up the job id that worker 1 has just finished 
with.
* I observe this even if I crank up the transaction isolation level to 
repeatable read and serializable.


I'm wondering if row locks are not obeying the same transactional semantics as 
row data, as a potential explanation for the above behaviour is as follows 
(W1/2 = Worker 1/2):

W1: BEGIN;
W1: SELECT ...
W1: (SELECT returns id=1. W1 now has job(id=1) locked.)
W1: INSERT INTO result (id) VALUES (1)

W2: BEGIN;
W2: SELECT ... 

W1: COMMIT; job(id=1) is now unlocked.

W2: (SELECT returns id=1: W1 had not committed when the SELECT started, so 
result(id=1) is not visible, but LockRows found that job(id=1) was not locked. 
W2 now has job(id=1) locked.)


...i.e. W2's SELECT could not see the row that W1 INSERTed (because W2's BEGIN 
occurs and W2's SELECT begins before W1's commit), but W2's SELECT *could* see 
the removal of W1's row lock. 


Perhaps this is a misuse of the locking system, since I'm locking a row "FOR 
UPDATE" but not actually updating it, but as row locks are released at the end 
of a transaction (according to the docs) then my expectation was for the 
unlocking and the visibility of newly committed rows to be atomic.
I've tried FOR NO KEY UPDATE too, without luck.

If I'm doing something forbidden (and the docs say so) then I'd be grateful if 
someone could point that out!

Best,
Steven.



Reply via email to