On Fri, Mar 10, 2023 at 11:37 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote: > > Is it reasonable to assume that all modern platforms can time > > millisecond delays accurately? Ten years ago I'd have suggested > > truncating the delay to a multiple of 10msec and using this logic > > to track the remainder, but maybe now that's unnecessary. > > If so, it might also be worth updating or removing this comment in > pgsleep.c: > > * NOTE: although the delay is specified in microseconds, the effective > * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect > * the requested delay to be rounded up to the next resolution boundary. > > I've had doubts for some time about whether this is still accurate...
What I see with the old select(), or a more modern clock_nanosleep() call, is that Linux, FreeBSD, macOS are happy sleeping for .1ms, .5ms, 1ms, 2ms, 3ms, and through innaccuracies and scheduling overheads etc it works out to about 5-25% extra sleep time (I expect that can be affected by choice of time source/available hardware, and perhaps various system calls use different tricks). I definitely recall the behaviour described, back in the old days where more stuff was scheduler-tick based. I have no clue for Windows; quick googling tells me that it might still be pretty chunky, unless you do certain other stuff that I didn't follow up; we could probably get more accurate sleep times by rummaging through nt.dll. It would be good to find out how well WaitEventSet does on Windows; perhaps we should have a little timing accuracy test in the tree to collect build farm data? FWIW epoll has a newer _pwait2() call that has higher res timeout argument, and Windows WaitEventSet could also do high res timers if you add timer events rather than using the timeout argument, and I guess conceptually even the old poll() thing could do the equivalent with a signal alarm timer, but it sounds a lot like a bad idea to do very short sleeps to me, burning so much CPU on scheduling. I kinda wonder if the 10ms + residual thing might even turn out to be a better idea... but I dunno. The 1ms residual thing looks pretty good to me as a fix to the immediate problem report, but we might also want to adjust the wording in config.sgml?