condition variable cleanup and subtransactions

Andres Freund Thu, 21 Dec 2017 03:01:50 -0800

Hi,

On 2017-12-21 02:42:25 -0800, Andres Freund wrote:
> Trying to debug this I found another issue. I'd placed a sleep(10) in
> ExecParallelHashCloseBatchAccessors() and then ctrl-c'ed the server for
> some reason. Segfault time:
> 
> #0  0x000055bfbac42539 in tas (lock=0x7fcd82ae14ac <error: Cannot access 
> memory at address 0x7fcd82ae14ac>) at 
> /home/andres/src/postgresql/src/include/storage/s_lock.h:228
> #1  0x000055bfbac42b4d in ConditionVariableCancelSleep () at 
> /home/andres/src/postgresql/src/backend/storage/lmgr/condition_variable.c:173
> #2  0x000055bfba8e24ae in AbortTransaction () at 
> /home/andres/src/postgresql/src/backend/access/transam/xact.c:2478
> #3  0x000055bfba8e4a2a in AbortOutOfAnyTransaction () at 
> /home/andres/src/postgresql/src/backend/access/transam/xact.c:4387


> So, afaics no workers had yet attached, the leader accepted the cancel
> interrupt, the dsm segments were destroyed, and as part of cleanup
> cv_sleep_target was supposed to be reset, which fails, because it's
> memory has since been freed.   Looking at how that can happen.

Oh. This seems to be a condition variable bug independent of PHJ. The
problem is that the DSM segment etc all get cleaned up in
*subtransaction* abort.

Afaict it's a bug that AbortTransaction() does
ConditionVariableCancelSleep() but AbortSubTransaction() does not,
despite the latter releasing dsm segments via
ResourceOwnerRelease(RESOURCE_RELEASE_BEFORE_LOCKS).

Adding that seems to fix the crash.

This seems like something we need to backpatch.

Greetings,

Andres Freund

condition variable cleanup and subtransactions

Reply via email to