On Tue, Jan 28, 2020 at 12:36:41PM +0100, Julien Rouhaud wrote: > So, AFAICT the LockHashPartitionLockByProc is required when > iterating/modifying lockGroupMembers or lockGroupLink, but just > getting the leader pid should be safe.
This still seems unsafe: git show -U11 -w --patience b025f32e0b src/backend/utils/adt/pgstatfuncs.c + /* + * If a PGPROC entry was retrieved, display wait events and lock + * group leader information if any. To avoid extra overhead, no + * extra lock is being held, so there is no guarantee of + * consistency across multiple rows. + */ ... + PGPROC *leader; ... + leader = proc->lockGroupLeader; + if (leader) + { # does something guarantee that leader doesn't change ? + values[29] = Int32GetDatum(leader->pid); + nulls[29] = false; } Michael seems to have raised the issue: On Thu, Jan 16, 2020 at 04:49:12PM +0900, Michael Paquier wrote: > And actually, the way you are looking at the leader's PID is visibly > incorrect and inconsistent because the patch takes no shared LWLock on > the leader using LockHashPartitionLockByProc() followed by > LWLockAcquire(), no? That's incorrect because it could be perfectly > possible to crash with this code between the moment you check if > lockGroupLeader is NULL and the moment you look at > lockGroupLeader->pid if a process is being stopped in-between and > removes itself from a lock group in ProcKill(). But I don't see how it was addressed ? I read this: src/backend/storage/lmgr/lock.c: * completely valid. We cannot safely dereference another backend's src/backend/storage/lmgr/lock.c- * lockGroupLeader field without holding all lock partition locks, and src/backend/storage/lmgr/lock.c- * it's not worth that.) I think if you do: |LWLockAcquire(&proc->backendLock, LW_SHARED); ..then it's at least *safe* to access leader->pid, but it may be inconsistent unless you also call LockHashPartitionLockByProc. I wasn't able to produce a crash, so maybe I missed something. -- Justin