Re: [PATCH] LWLock self-deadlock detection

Craig Ringer Tue, 24 Nov 2020 22:17:27 -0800

On Tue, Nov 24, 2020 at 10:11 PM Ashutosh Bapat <
ashutosh.bapat....@gmail.com> wrote:


> This looks useful. LWLockCheckSelfDeadlock() could use LWLockHeldByMe
> variant instead of copying that code with possibly a change in that
> function to return the required information.
>

Yes, possibly so. I was reluctant to mess with the rest of the code too
much.

I am also seeing a pattern
> Assert(!LWLockHeldByMe());
> LWLockAcquire()
>
> at some places. Should we change LWLockAcquire to do
> Assert(!LWLockHeldByMe()) always to detect such occurrences?


I'm inclined not to, at least not without benchmarking it, because that'd
do the check before we attempt the fast-path. cassert builds are still
supposed to perform decently and be suitable for day to day development and
I'd rather not risk a slowdown.

I'd prefer to make the lock self deadlock check run for production builds,
not just cassert builds. It'd print a normal LOG (with backtrace if
supported) then Assert(). So on an assert build we'd get a crash and core,
and on a non-assert build we'd carry on and self-deadlock anyway.

That's probably the safest thing to do. We can't expect to safely ERROR out
of the middle of an LWLockAcquire(), not without introducing a new and
really hard to test code path that'll also be surprising to callers. We
probably don't want to PANIC the whole server for non-assert builds since
it might enter a panic-loop. So it's probably better to self-deadlock. We
could HINT that a -m immediate shutdown will be necessary to recover though.

I don't think it makes sense to introduce much complexity for a feature
that's mainly there to help developers and to catch corner-case bugs.

(FWIW a variant of this patch has been in 2ndQPostgres for some time. It
helped catch an extension bug just the other day.)

Re: [PATCH] LWLock self-deadlock detection

Reply via email to