The attached patch attempts to fix this. 高增琦 <pgf...@gmail.com> 于2024年9月11日周三 14:30写道:
> At the end of SetupLockInTable(), there is a check for the "lock already > held" error. > Because the nRequested and requested[lockmode] value of a lock is bumped > before "lock already held" error, and there is no way to reduce them later > for > this situation, then it will keep the inconsistency in lock structure until > cluster restart or reset. > > The inconsistency is: > * nRequested will never reduce to zero, the lock will never be > garbage-collected > * if there is a waitMask for this lock, the waitMast will never be > removed, then > new proc will be blocked to wait for a lock with zero holder > (looks weird in the pg_locks table) > > I think moving the "lock already held" error before the bump operation of > nRequested > and requested[lockmode] value in SetupLockInTable() will fix it. > (maybe also fix the lock_twophase_recover() function) > > To recreate the inconsistency: > 1. create a backend 1 to lock table a, keep it idle in transaction > 2. terminate backend 1 and hack it to skip the LockReleaseAll() function > 3. create another backend 2 to lock table a, it will wait for the lock to > release > 4. reuse the backend 1 (reuse the same proc) to lock table a again, > it will trigger the "lock already held" error > 5. quit both backend 1 and 2 > 6. create backend 3 to lock table a, it will wait for the lock's waitMask > 7. check the pg_locks table > > -- > GaoZengqi > pgf...@gmail.com > zengqi...@gmail.com > -- GaoZengqi pgf...@gmail.com zengqi...@gmail.com
fix-lock-already-held.patch
Description: Binary data