Hi, On 2017-12-21 02:42:25 -0800, Andres Freund wrote: > Trying to debug this I found another issue. I'd placed a sleep(10) in > ExecParallelHashCloseBatchAccessors() and then ctrl-c'ed the server for > some reason. Segfault time: > > #0 0x000055bfbac42539 in tas (lock=0x7fcd82ae14ac <error: Cannot access > memory at address 0x7fcd82ae14ac>) at > /home/andres/src/postgresql/src/include/storage/s_lock.h:228 > #1 0x000055bfbac42b4d in ConditionVariableCancelSleep () at > /home/andres/src/postgresql/src/backend/storage/lmgr/condition_variable.c:173 > #2 0x000055bfba8e24ae in AbortTransaction () at > /home/andres/src/postgresql/src/backend/access/transam/xact.c:2478 > #3 0x000055bfba8e4a2a in AbortOutOfAnyTransaction () at > /home/andres/src/postgresql/src/backend/access/transam/xact.c:4387
> So, afaics no workers had yet attached, the leader accepted the cancel > interrupt, the dsm segments were destroyed, and as part of cleanup > cv_sleep_target was supposed to be reset, which fails, because it's > memory has since been freed. Looking at how that can happen. Oh. This seems to be a condition variable bug independent of PHJ. The problem is that the DSM segment etc all get cleaned up in *subtransaction* abort. Afaict it's a bug that AbortTransaction() does ConditionVariableCancelSleep() but AbortSubTransaction() does not, despite the latter releasing dsm segments via ResourceOwnerRelease(RESOURCE_RELEASE_BEFORE_LOCKS). Adding that seems to fix the crash. This seems like something we need to backpatch. Greetings, Andres Freund