On Fri, Jan 27, 2023 at 9:57 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Fri, Jan 27, 2023 at 9:49 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Tomas Vondra <tomas.von...@enterprisedb.com> writes: > > > I received an alert dikkop (my rpi4 buildfarm animal running freebsd 14) > > > did not report any results for a couple days, and it seems it got into > > > an infinite loop in REL_11_STABLE when building hash table in a parallel > > > hashjoin, or something like that. > > > > > It seems to be progressing now, probably because I attached gdb to the > > > workers to get backtraces, which does signals etc. > > > > That reminds me of cases that I saw several times on my now-deceased > > animal florican: > > > > https://www.postgresql.org/message-id/flat/2245838.1645902425%40sss.pgh.pa.us > > > > There's clearly something rotten somewhere in there, but whether > > it's our bug or FreeBSD's isn't clear. > > And if it's ours, it's possibly in latch code and not anything higher > (I mean, not in condition variables, barriers, or parallel hash join) > because I saw a similar hang in the shm_mq stuff which uses the latch > API directly. Note that 13 switched to kqueue but still used the > self-pipe, and 14 switched to a signal event, and this hasn't been > reported in those releases or later, which makes the poll() code path > a key suspect.
Also, 14 changed the flag/memory barrier dance (maybe_sleeping), but 13 did it the same way as 11 + 12. So between 12 and 13 we have just the poll -> kqueue change.