On Tue, Nov 24, 2020 at 10:11 PM Ashutosh Bapat < ashutosh.bapat....@gmail.com> wrote:
> This looks useful. LWLockCheckSelfDeadlock() could use LWLockHeldByMe > variant instead of copying that code with possibly a change in that > function to return the required information. > Yes, possibly so. I was reluctant to mess with the rest of the code too much. I am also seeing a pattern > Assert(!LWLockHeldByMe()); > LWLockAcquire() > > at some places. Should we change LWLockAcquire to do > Assert(!LWLockHeldByMe()) always to detect such occurrences? I'm inclined not to, at least not without benchmarking it, because that'd do the check before we attempt the fast-path. cassert builds are still supposed to perform decently and be suitable for day to day development and I'd rather not risk a slowdown. I'd prefer to make the lock self deadlock check run for production builds, not just cassert builds. It'd print a normal LOG (with backtrace if supported) then Assert(). So on an assert build we'd get a crash and core, and on a non-assert build we'd carry on and self-deadlock anyway. That's probably the safest thing to do. We can't expect to safely ERROR out of the middle of an LWLockAcquire(), not without introducing a new and really hard to test code path that'll also be surprising to callers. We probably don't want to PANIC the whole server for non-assert builds since it might enter a panic-loop. So it's probably better to self-deadlock. We could HINT that a -m immediate shutdown will be necessary to recover though. I don't think it makes sense to introduce much complexity for a feature that's mainly there to help developers and to catch corner-case bugs. (FWIW a variant of this patch has been in 2ndQPostgres for some time. It helped catch an extension bug just the other day.)