On Sat, Nov 6, 2021 at 11:04 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > > Rebased patches attached. I will change status back to "Ready for Committer" > > The CI showed a crash on freebsd, which I reproduced. > https://cirrus-ci.com/task/5203060415791104 > > The crash is evidenced in 0001 - but only ~15% of the time. > > I think it's the same thing which was committed and then reverted here, so > maybe I'm not saying anything new. > > https://commitfest.postgresql.org/33/3031/ > https://www.postgresql.org/message-id/flat/20200929061142.ga29...@paquier.xyz > > (gdb) p pstate->build_barrier->phase > Cannot access memory at address 0x7f82e0fa42f4 > > #1 0x00007f13de34f801 in __GI_abort () at abort.c:79 > #2 0x00005638e6a16d28 in ExceptionalCondition > (conditionName=conditionName@entry=0x5638e6b62850 "!pstate || > BarrierPhase(&pstate->build_barrier) >= PHJ_BUILD_RUN", > errorType=errorType@entry=0x5638e6a6f00b "FailedAssertion", > fileName=fileName@entry=0x5638e6b625be "nodeHash.c", > lineNumber=lineNumber@entry=3305) at assert.c:69 > #3 0x00005638e678085b in ExecHashTableDetach (hashtable=0x5638e8e6ca88) at > nodeHash.c:3305 > #4 0x00005638e6784656 in ExecShutdownHashJoin > (node=node@entry=0x5638e8e57cb8) at nodeHashjoin.c:1400 > #5 0x00005638e67666d8 in ExecShutdownNode (node=0x5638e8e57cb8) at > execProcnode.c:812 > #6 ExecShutdownNode (node=0x5638e8e57cb8) at execProcnode.c:772 > #7 0x00005638e67cd5b1 in planstate_tree_walker > (planstate=planstate@entry=0x5638e8e58580, walker=walker@entry=0x5638e6766680 > <ExecShutdownNode>, context=context@entry=0x0) at nodeFuncs.c:4009 > #8 0x00005638e67666b2 in ExecShutdownNode (node=0x5638e8e58580) at > execProcnode.c:792 > #9 ExecShutdownNode (node=0x5638e8e58580) at execProcnode.c:772 > #10 0x00005638e67cd5b1 in planstate_tree_walker > (planstate=planstate@entry=0x5638e8e58418, walker=walker@entry=0x5638e6766680 > <ExecShutdownNode>, context=context@entry=0x0) at nodeFuncs.c:4009 > #11 0x00005638e67666b2 in ExecShutdownNode (node=0x5638e8e58418) at > execProcnode.c:792 > #12 ExecShutdownNode (node=node@entry=0x5638e8e58418) at execProcnode.c:772 > #13 0x00005638e675f518 in ExecutePlan (execute_once=<optimized out>, > dest=0x5638e8df0058, direction=<optimized out>, numberTuples=0, > sendTuples=<optimized out>, operation=CMD_SELECT, > use_parallel_mode=<optimized out>, planstate=0x5638e8e58418, > estate=0x5638e8e57a10) at execMain.c:1658 > #14 standard_ExecutorRun () at execMain.c:410 > #15 0x00005638e6763e0a in ParallelQueryMain (seg=0x5638e8d823d8, > toc=0x7f13df4e9000) at execParallel.c:1493 > #16 0x00005638e663f6c7 in ParallelWorkerMain () at parallel.c:1495 > #17 0x00005638e68542e4 in StartBackgroundWorker () at bgworker.c:858 > #18 0x00005638e6860f53 in do_start_bgworker (rw=<optimized out>) at > postmaster.c:5883 > #19 maybe_start_bgworkers () at postmaster.c:6108 > #20 0x00005638e68619e5 in sigusr1_handler (postgres_signal_arg=<optimized > out>) at postmaster.c:5272 > #21 <signal handler called> > #22 0x00007f13de425ff7 in __GI___select (nfds=nfds@entry=7, > readfds=readfds@entry=0x7ffef03b8400, writefds=writefds@entry=0x0, > exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffef03b8360) > at ../sysdeps/unix/sysv/linux/select.c:41 > #23 0x00005638e68620ce in ServerLoop () at postmaster.c:1765 > #24 0x00005638e6863bcc in PostmasterMain () at postmaster.c:1473 > #25 0x00005638e658fd00 in main (argc=8, argv=0x5638e8d54730) at main.c:198
Yes, this looks like that issue. I've attached a v8 set with the fix I suggested in [1] included. (I added it to 0001). - Melanie [1] https://www.postgresql.org/message-id/flat/20200929061142.GA29096%40paquier.xyz
v8-0003-Parallel-Hash-Full-Right-Outer-Join.patch
Description: Binary data
v8-0002-Improve-the-naming-of-Parallel-Hash-Join-phases.patch
Description: Binary data
v8-0001-Fix-race-condition-in-parallel-hash-join-batch-cl.patch
Description: Binary data