Andres Freund <and...@anarazel.de> writes: > On 2022-02-26 14:07:05 -0500, Tom Lane wrote: >> I have observed this three times in the REL_11 branch, once >> in REL_12, and a couple of times last summer before it occurred >> to me to start keeping notes. Over that time the machine has >> been running various patchlevels of FreeBSD 13.0.
> It's certainly interesting that it appears to happen only in the branches > using poll rather than kqueue to implement latches. That changed between 12 > and 13. Yeah, and there was no PHJ in v10, so that's a pretty good theory as to why I've only seen it in those two branches. > Have you tried running the core regression tests with force_parallel_mode = > on, or with the parallel costs lowered, to see if that makes the problem > appear more often? > The next time this happens / if you still have this open, perhaps it could be > worth checking if there's a byte in the self pipe? > Besides trying to make the issue more likely as suggested above, it might be > worth checking if signalling the stuck processes with SIGUSR1 gets them > unstuck. I've now wasted a bunch of kilowatt-hours fruitlessly trying to reproduce this outside the confines of the buildfarm script. I'm at a loss to figure out what the buildfarm is doing differently, but apparently there's something. I'm going to re-enable the machine's buildfarm job and just wait for it to hang up again. More info eventually ... regards, tom lane