Hi, On 2024-04-11 16:46:10 -0400, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: > > On 2024-04-11 16:11:40 -0400, Tom Lane wrote: > >> We wouldn't need to fix it, if we simply removed the NUM_DELAYS > >> limit. Whatever kicked us off the sleep doesn't matter, we might > >> as well go check the spinlock. > > > I suspect we should fix it regardless of whether we keep NUM_DELAYS. We > > shouldn't increase cur_delay faster just because a lot of signals are coming > > in. > > I'm unconvinced there's a problem there.
Obviously that's a different aspect than efficiency, but in local, admittedly extreme, testing I've seen stuck spinlocks being detected in a fraction of the normally expected time. A spinlock that ends up sleeping for close to a second after a relatively short amount of time surely isn't good for predictable performance. IIRC the bad case was on a hot standby, with some recovery conflict causing the startup process to send a lot of signals. > Also, what would you do about this that wouldn't involve adding kernel calls > for gettimeofday? Admittedly, if we only do that when we're about to sleep, > maybe it's not so awful; but it's still adding complexity that I'm > unconvinced is warranted. At least on !windows, pg_usleep() uses nanosleep(), which, when interrupted by a signal, can return the remaining time until the experation of the timer. I suspect that on windows computing the time when a signal arrived wouldn't be expensive, compared to all the other overhead implied by our signal handling emulation. Greetings, Andres Freund