The attached patch attempts to fix this.

高增琦 <pgf...@gmail.com> 于2024年9月11日周三 14:30写道:

> At the end of SetupLockInTable(), there is a check for the "lock already
> held" error.
> Because the nRequested and requested[lockmode] value of a lock is bumped
> before "lock already held" error, and there is no way to reduce them later
> for
> this situation, then it will keep the inconsistency in lock structure until
> cluster restart or reset.
>
> The inconsistency is:
> * nRequested will never reduce to zero, the lock will never be
> garbage-collected
> * if there is a waitMask for this lock, the waitMast will never be
> removed, then
>   new proc will be blocked to wait for a lock with zero holder
>   (looks weird in the pg_locks table)
>
> I think moving the "lock already held" error before the bump operation of
> nRequested
> and requested[lockmode] value in SetupLockInTable() will fix it.
> (maybe also fix the lock_twophase_recover() function)
>
> To recreate the inconsistency:
> 1. create a backend 1 to lock table a, keep it idle in transaction
> 2. terminate backend 1 and hack it to skip the LockReleaseAll() function
> 3. create another backend 2 to lock table a, it will wait for the lock to
> release
> 4. reuse the backend 1 (reuse the same proc) to lock table a again,
>    it will trigger the "lock already held" error
> 5. quit both backend 1 and 2
> 6. create backend 3 to lock table a, it will wait for the lock's waitMask
> 7. check the pg_locks table
>
> --
> GaoZengqi
> pgf...@gmail.com
> zengqi...@gmail.com
>


-- 
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com

Attachment: fix-lock-already-held.patch
Description: Binary data

Reply via email to