Re: Issue with the PRNG used by Postgres

Andres Freund Thu, 11 Apr 2024 14:17:27 -0700

Hi,

On 2024-04-11 16:46:10 -0400, Tom Lane wrote:
> Andres Freund <and...@anarazel.de> writes:
> > On 2024-04-11 16:11:40 -0400, Tom Lane wrote:
> >> We wouldn't need to fix it, if we simply removed the NUM_DELAYS
> >> limit.  Whatever kicked us off the sleep doesn't matter, we might
> >> as well go check the spinlock.
> 
> > I suspect we should fix it regardless of whether we keep NUM_DELAYS. We
> > shouldn't increase cur_delay faster just because a lot of signals are coming
> > in.
> 
> I'm unconvinced there's a problem there.


Obviously that's a different aspect than efficiency, but in local, admittedly
extreme, testing I've seen stuck spinlocks being detected in a fraction of the
normally expected time. A spinlock that ends up sleeping for close to a second
after a relatively short amount of time surely isn't good for predictable
performance.

IIRC the bad case was on a hot standby, with some recovery conflict causing
the startup process to send a lot of signals.


> Also, what would you do about this that wouldn't involve adding kernel calls
> for gettimeofday?  Admittedly, if we only do that when we're about to sleep,
> maybe it's not so awful; but it's still adding complexity that I'm
> unconvinced is warranted.

At least on !windows, pg_usleep() uses nanosleep(), which, when interrupted by
a signal, can return the remaining time until the experation of the timer.

I suspect that on windows computing the time when a signal arrived wouldn't be
expensive, compared to all the other overhead implied by our signal handling
emulation.

Greetings,

Andres Freund

Re: Issue with the PRNG used by Postgres

Reply via email to