Hi, Over in [1], I thought for a moment that a new function WaitLatchUs(..., timeout_us, ...) was going to be useful to fix that bug report, at least in master, until I realised the required Linux syscall is a little too new (for example RHEL 9 shipped May '22, Debian 12 is expected to be declared "stable" in a few months). So I'm kicking this proof-of-concept over into a new thread to talk about in the next cycle, in case it turns out to be useful later.
There probably isn't too much call for very high resolution sleeping. Most time-based sleeping is probably bad, but when it's legitimately used to spread CPU or I/O out (instead of illegitimate use for polling-based algorithms), it seems nice to be able to use all the accuracy your hardware can provide, and yet it is still important to be able to process other kinds of events, so WaitLatchUs() seems like a better building block than pg_usleep(). One question is whether it'd be better to use nanoseconds instead, since the relevant high-resolution primitives use those under the covers (struct timespec). On the other hand, microseconds are a good match for our TimestampTz which is the ultimate source of many of our timeout decisions. I suppose we could also consider an interface with an absolute timeout instead, and then stop thinking about the units so much. As mentioned in that other thread, the only systems that currently seem to be able to sleep less than 1ms through these multiplexing APIs are: Linux 5.11+ (epoll_pwait2()), FreeBSD (kevent()), macOS (ditto). Everything else will round up to milliseconds at the kernel interface (because poll(), epoll_wait() and WaitForMultipleObjects() take those) or later inside the kernel due to kernel tick rounding. There might be ways to do better on Windows with separate timer events, but I don't know. [1] https://www.postgresql.org/message-id/flat/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg%40mail.gmail.com
From e99b7d31831f31888a9433a83d3e64ccbe2cc5c7 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Fri, 10 Mar 2023 15:16:47 +1300 Subject: [PATCH 1/3] Support microsecond based timeouts in WaitEventSet API. WaitLatch() can only wait for whole numbers of milliseconds, a limitation inherited ultimately from poll() and similar interfaces. In the past it didn't matter much as sleep times were very inaccurate in practice on common systems, but Linux and others can now be accurate down to small fractions of a millisecond. In order to be able to replace pg_usleep() calls, provide WaitLatchUs(). Just like pg_usleep(), the actual resolution of the sleeping depends on the OS and hardware. For Linux, this requires epoll_pwait2() (Linux 5.11), otherwise we have to round to milliseconds for epoll_wait(). For macOS and *BSD, kevent() has always supported nanosecond-based timeouts, but only macOS and FreeBSD are known to support high resolution timers (other BSDs tested currently round up to kernel ticks so WaitLatch() already couldn't sleep for only 1ms). For Solaris and AIX, we currently use poll() and that requires rounding up to milliseconds, so no improvement over WaitLatch() there. Likewise for Windows (which already couldn't sleep for only 1ms due to internal rounding to tick size). Discussion: https://postgr.es/m/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg%40mail.gmail.com --- configure | 2 +- configure.ac | 1 + meson.build | 1 + src/backend/storage/ipc/latch.c | 146 ++++++++++++++++++++++++-------- src/include/pg_config.h.in | 3 + src/include/storage/latch.h | 13 ++- src/tools/msvc/Solution.pm | 1 + 7 files changed, 128 insertions(+), 39 deletions(-) diff --git a/configure b/configure index e35769ea73..914361f91b 100755 --- a/configure +++ b/configure @@ -15699,7 +15699,7 @@ fi LIBS_including_readline="$LIBS" LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` -for ac_func in backtrace_symbols copyfile getifaddrs getpeerucred inet_pton kqueue mbstowcs_l memset_s posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strchrnul strsignal syncfs sync_file_range uselocale wcstombs_l +for ac_func in backtrace_symbols copyfile epoll_pwait2 getifaddrs getpeerucred inet_pton kqueue mbstowcs_l memset_s posix_fallocate ppoll pthread_is_threaded_np setproctitle setproctitle_fast strchrnul strsignal syncfs sync_file_range uselocale wcstombs_l do : as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" diff --git a/configure.ac b/configure.ac index af23c15cb2..4249f8002c 100644 --- a/configure.ac +++ b/configure.ac @@ -1794,6 +1794,7 @@ LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'` AC_CHECK_FUNCS(m4_normalize([ backtrace_symbols copyfile + epoll_pwait2 getifaddrs getpeerucred inet_pton diff --git a/meson.build b/meson.build index d4384f1bf6..fe9b0470aa 100644 --- a/meson.build +++ b/meson.build @@ -2344,6 +2344,7 @@ func_checks = [ # when enabling asan the dlopen check doesn't notice that -ldl is actually # required. Just checking for dlsym() ought to suffice. ['dlsym', {'dependencies': [dl_dep], 'define': false}], + ['epoll_pwait2'], ['explicit_bzero'], ['fdatasync', {'dependencies': [rt_dep, posix4_dep], 'define': false}], # Solaris ['getifaddrs'], diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c index f4123e7de7..ba9ccb19ac 100644 --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -194,7 +194,7 @@ static void WaitEventAdjustPoll(WaitEventSet *set, WaitEvent *event); static void WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event); #endif -static inline int WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, +static inline int WaitEventSetWaitBlock(WaitEventSet *set, int64 cur_timeout_us, WaitEvent *occurred_events, int nevents); /* @@ -475,10 +475,9 @@ DisownLatch(Latch *latch) * to wait for. If the latch is already set (and WL_LATCH_SET is given), the * function returns immediately. * - * The "timeout" is given in milliseconds. It must be >= 0 if WL_TIMEOUT flag - * is given. Although it is declared as "long", we don't actually support - * timeouts longer than INT_MAX milliseconds. Note that some extra overhead - * is incurred when WL_TIMEOUT is given, so avoid using a timeout if possible. + * The "timeout" is given in microseconds. It must be >= 0 if WL_TIMEOUT flag + * is given. Note that some extra overhead is incurred when WL_TIMEOUT is + * given, so avoid using a timeout if possible. * * The latch must be owned by the current process, ie. it must be a * process-local latch initialized with InitLatch, or a shared latch @@ -489,8 +488,8 @@ DisownLatch(Latch *latch) * we return all of them in one call, but we will return at least one. */ int -WaitLatch(Latch *latch, int wakeEvents, long timeout, - uint32 wait_event_info) +WaitLatchUs(Latch *latch, int wakeEvents, int64 timeout_us, + uint32 wait_event_info) { WaitEvent event; @@ -510,15 +509,32 @@ WaitLatch(Latch *latch, int wakeEvents, long timeout, LatchWaitSet->exit_on_postmaster_death = ((wakeEvents & WL_EXIT_ON_PM_DEATH) != 0); - if (WaitEventSetWait(LatchWaitSet, - (wakeEvents & WL_TIMEOUT) ? timeout : -1, - &event, 1, - wait_event_info) == 0) + if (WaitEventSetWaitUs(LatchWaitSet, + (wakeEvents & WL_TIMEOUT) ? timeout_us : -1, + &event, 1, + wait_event_info) == 0) return WL_TIMEOUT; else return event.events; } +/* + * Like WaitLatchUs(), but with the timeout in milliseconds. + * + * The "timeout" is given in milliseconds. It must be >= 0 if WL_TIMEOUT flag + * is given. Although it is declared as "long", we don't actually support + * timeouts longer than INT_MAX milliseconds. Note that some extra overhead + * is incurred when WL_TIMEOUT is given, so avoid using a timeout if possible. + */ +int +WaitLatch(Latch *latch, int wakeEvents, long timeout_ms, + uint32 wait_event_info) +{ + return WaitLatchUs(latch, wakeEvents, + timeout_ms <= 0 ? timeout_ms : timeout_ms * 1000, + wait_event_info); +} + /* * Like WaitLatch, but with an extra socket argument for WL_SOCKET_* * conditions. @@ -537,8 +553,8 @@ WaitLatch(Latch *latch, int wakeEvents, long timeout, * WaitEventSet instead; that's more efficient. */ int -WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock, - long timeout, uint32 wait_event_info) +WaitLatchOrSocketUs(Latch *latch, int wakeEvents, pgsocket sock, + int64 timeout_us, uint32 wait_event_info) { int ret = 0; int rc; @@ -546,9 +562,9 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock, WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, 3); if (wakeEvents & WL_TIMEOUT) - Assert(timeout >= 0); + Assert(timeout_us >= 0); else - timeout = -1; + timeout_us = -1; if (wakeEvents & WL_LATCH_SET) AddWaitEventToSet(set, WL_LATCH_SET, PGINVALID_SOCKET, @@ -575,7 +591,7 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock, AddWaitEventToSet(set, ev, sock, NULL, NULL); } - rc = WaitEventSetWait(set, timeout, &event, 1, wait_event_info); + rc = WaitEventSetWaitUs(set, timeout_us, &event, 1, wait_event_info); if (rc == 0) ret |= WL_TIMEOUT; @@ -591,6 +607,20 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock, return ret; } +/* + * Like WaitLatchOrSocket, but with timeout in milliseconds. + */ +int +WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock, + long timeout_ms, uint32 wait_event_info) +{ + return WaitLatchOrSocketUs(latch, + wakeEvents, + sock, + timeout_ms > 0 ? timeout_ms * 1000 : timeout_ms, + wait_event_info); +} + /* * Sets a latch and wakes up anyone waiting on it. * @@ -1380,14 +1410,14 @@ WaitEventAdjustWin32(WaitEventSet *set, WaitEvent *event) * values associated with the registered event. */ int -WaitEventSetWait(WaitEventSet *set, long timeout, - WaitEvent *occurred_events, int nevents, - uint32 wait_event_info) +WaitEventSetWaitUs(WaitEventSet *set, int64 timeout_us, + WaitEvent *occurred_events, int nevents, + uint32 wait_event_info) { int returned_events = 0; instr_time start_time; instr_time cur_time; - long cur_timeout = -1; + int64 cur_timeout = -1; Assert(nevents > 0); @@ -1395,11 +1425,11 @@ WaitEventSetWait(WaitEventSet *set, long timeout, * Initialize timeout if requested. We must record the current time so * that we can determine the remaining timeout if interrupted. */ - if (timeout >= 0) + if (timeout_us >= 0) { INSTR_TIME_SET_CURRENT(start_time); - Assert(timeout >= 0 && timeout <= INT_MAX); - cur_timeout = timeout; + Assert(timeout_us >= 0); + cur_timeout = timeout_us; } else INSTR_TIME_SET_ZERO(start_time); @@ -1487,11 +1517,11 @@ WaitEventSetWait(WaitEventSet *set, long timeout, returned_events = rc; /* If we're not done, update cur_timeout for next iteration */ - if (returned_events == 0 && timeout >= 0) + if (returned_events == 0 && timeout_us >= 0) { INSTR_TIME_SET_CURRENT(cur_time); INSTR_TIME_SUBTRACT(cur_time, start_time); - cur_timeout = timeout - (long) INSTR_TIME_GET_MILLISEC(cur_time); + cur_timeout = timeout_us - INSTR_TIME_GET_MICROSEC(cur_time); if (cur_timeout <= 0) break; } @@ -1505,6 +1535,20 @@ WaitEventSetWait(WaitEventSet *set, long timeout, return returned_events; } +/* + * Like WaitEventSetWaitUs(), but the timeout specified in milliseconds. + */ +int +WaitEventSetWait(WaitEventSet *set, long timeout_ms, + WaitEvent *occurred_events, int nevents, + uint32 wait_event_info) +{ + return WaitEventSetWaitUs(set, + timeout_ms <= 0 ? timeout_ms : timeout_ms * 1000, + occurred_events, + nevents, + wait_event_info); +} #if defined(WAIT_USE_EPOLL) @@ -1517,17 +1561,31 @@ WaitEventSetWait(WaitEventSet *set, long timeout, * easy. */ static inline int -WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, +WaitEventSetWaitBlock(WaitEventSet *set, int64 cur_timeout_us, WaitEvent *occurred_events, int nevents) { int returned_events = 0; int rc; WaitEvent *cur_event; struct epoll_event *cur_epoll_event; +#ifdef HAVE_EPOLL_PWAIT2 + struct timespec nap; +#endif /* Sleep */ +#ifdef HAVE_EPOLL_PWAIT2 + nap.tv_sec = cur_timeout_us / 1000000; + nap.tv_nsec = (cur_timeout_us % 1000000) * 1000; + rc = epoll_pwait2(set->epoll_fd, set->epoll_ret_events, + Min(nevents, set->nevents_space), + cur_timeout_us >= 0 ? &nap : NULL, + NULL); +#else rc = epoll_wait(set->epoll_fd, set->epoll_ret_events, - Min(nevents, set->nevents_space), cur_timeout); + Min(nevents, set->nevents_space), + cur_timeout_us >= 0 ? (cur_timeout_us + 999) / 1000 + : cur_timeout_us); +#endif /* Check return code */ if (rc < 0) @@ -1653,7 +1711,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, * with separate system calls. */ static int -WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, +WaitEventSetWaitBlock(WaitEventSet *set, int64 cur_timeout_us, WaitEvent *occurred_events, int nevents) { int returned_events = 0; @@ -1663,12 +1721,12 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, struct timespec timeout; struct timespec *timeout_p; - if (cur_timeout < 0) + if (cur_timeout_us < 0) timeout_p = NULL; else { - timeout.tv_sec = cur_timeout / 1000; - timeout.tv_nsec = (cur_timeout % 1000) * 1000000; + timeout.tv_sec = cur_timeout_us / 1000000; + timeout.tv_nsec = (cur_timeout_us % 1000000) * 1000; timeout_p = &timeout; } @@ -1806,16 +1864,25 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, * but requires iterating through all of set->pollfds. */ static inline int -WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, +WaitEventSetWaitBlock(WaitEventSet *set, int64 cur_timeout_us, WaitEvent *occurred_events, int nevents) { int returned_events = 0; int rc; WaitEvent *cur_event; struct pollfd *cur_pollfd; + int cur_timeout_ms; + + /* Round up to the nearest millisecond, and cap at INT_MAX. */ + if (cur_timeout_us >= PG_INT64_MAX - 999) + cur_timeout_ms = INT_MAX; + else if (cur_timeout_us > 0) + cur_timeout_ms = Min((int64) INT_MAX, (cur_timeout_us + 999) / 1000); + else + cur_timeout_ms = cur_timeout_us; /* Sleep */ - rc = poll(set->pollfds, set->nevents, (int) cur_timeout); + rc = poll(set->pollfds, set->nevents, cur_timeout_ms); /* Check return code */ if (rc < 0) @@ -1943,12 +2010,21 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, * that only one event is "consumed". */ static inline int -WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, +WaitEventSetWaitBlock(WaitEventSet *set, int64 cur_timeout_us, WaitEvent *occurred_events, int nevents) { int returned_events = 0; DWORD rc; WaitEvent *cur_event; + int cur_timeout_ms; + + /* Round up to the nearest millisecond, and cap at INT_MAX. */ + if (cur_timeout_us >= PG_INT64_MAX - 999) + cur_timeout_ms = INT_MAX; + else if (cur_timeout_us > 0) + cur_timeout_ms = Min((int64) INT_MAX, (cur_timeout_us + 999) / 1000); + else + cur_timeout_ms = cur_timeout_us; /* Reset any wait events that need it */ for (cur_event = set->events; @@ -2000,7 +2076,7 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout, * Need to wait for ->nevents + 1, because signal handle is in [0]. */ rc = WaitForMultipleObjects(set->nevents + 1, set->handles, FALSE, - cur_timeout); + cur_timeout_ms); /* Check return code */ if (rc == WAIT_FAILED) diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in index 20c82f5979..c1f1fc6e70 100644 --- a/src/include/pg_config.h.in +++ b/src/include/pg_config.h.in @@ -149,6 +149,9 @@ /* Define to 1 if you have the <editline/readline.h> header file. */ #undef HAVE_EDITLINE_READLINE_H +/* Define to 1 if you have the `epoll_pwait2' function. */ +#undef HAVE_EPOLL_PWAIT2 + /* Define to 1 if you have the <execinfo.h> header file. */ #undef HAVE_EXECINFO_H diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h index 99cc47874a..756c3114ed 100644 --- a/src/include/storage/latch.h +++ b/src/include/storage/latch.h @@ -180,13 +180,20 @@ extern int AddWaitEventToSet(WaitEventSet *set, uint32 events, pgsocket fd, Latch *latch, void *user_data); extern void ModifyWaitEvent(WaitEventSet *set, int pos, uint32 events, Latch *latch); -extern int WaitEventSetWait(WaitEventSet *set, long timeout, +extern int WaitEventSetWait(WaitEventSet *set, long timeout_ms, WaitEvent *occurred_events, int nevents, uint32 wait_event_info); -extern int WaitLatch(Latch *latch, int wakeEvents, long timeout, +extern int WaitEventSetWaitUs(WaitEventSet *set, int64 timeout_us, + WaitEvent *occurred_events, int nevents, + uint32 wait_event_info); +extern int WaitLatch(Latch *latch, int wakeEvents, long timeout_ms, uint32 wait_event_info); +extern int WaitLatchUs(Latch *latch, int wakeEvents, int64 timeout_us, + uint32 wait_event_info); extern int WaitLatchOrSocket(Latch *latch, int wakeEvents, - pgsocket sock, long timeout, uint32 wait_event_info); + pgsocket sock, long timeout_ms, uint32 wait_event_info); +extern int WaitLatchOrSocketUs(Latch *latch, int wakeEvents, + pgsocket sock, int64 timeout_us, uint32 wait_event_info); extern void InitializeLatchWaitSet(void); extern int GetNumRegisteredWaitEvents(WaitEventSet *set); extern bool WaitEventSetCanReportClosed(void); diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm index 5eaea6355e..f88fffa5e2 100644 --- a/src/tools/msvc/Solution.pm +++ b/src/tools/msvc/Solution.pm @@ -247,6 +247,7 @@ sub GenerateFiles HAVE_DECL_STRNLEN => 1, HAVE_EDITLINE_HISTORY_H => undef, HAVE_EDITLINE_READLINE_H => undef, + HAVE_EPOLL_PWAIT2 => undef, HAVE_EXECINFO_H => undef, HAVE_EXPLICIT_BZERO => undef, HAVE_FSEEKO => 1, -- 2.39.2
From 9b5e8922b7c603f902fecfb30e24a962b6c08176 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Fri, 10 Mar 2023 16:22:59 +1300 Subject: [PATCH 2/3] Use microsecond-based naps for vacuum_cost_delay sleep. Now that we have microsecond support in the WaitEventSet API, we can use the standard programming pattern to implement the high resolution sleep in vacuum_delay_point(). XXX We wouldn't be able to do this until Linux 5.11 is in common stable distributions, otherwise the sleep would lose precision when changing from the pg_usleep() coding. Reported-by: Melanie Plageman <melanieplage...@gmail.com> Discussion: https://postgr.es/m/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg%40mail.gmail.com --- src/backend/commands/vacuum.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 2e12baf8eb..f379e60dca 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -2232,10 +2232,10 @@ vacuum_delay_point(void) if (msec > VacuumCostDelay * 4) msec = VacuumCostDelay * 4; - (void) WaitLatch(MyLatch, - WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, - msec, - WAIT_EVENT_VACUUM_DELAY); + (void) WaitLatchUs(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + msec * 1000, + WAIT_EVENT_VACUUM_DELAY); ResetLatch(MyLatch); VacuumCostBalance = 0; -- 2.39.2
From dd40a3e3c28c69466bc2e8c2a223608ac51e05b7 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Fri, 10 Mar 2023 16:19:42 +1300 Subject: [PATCH 3/3] Use microsecond-based naps in walreceiver. Since anything based on timestamp differences is really in microseconds under the covers, we might as well use the new higher resolution API for waiting. XXX For illustration; there would be many other places that could change like this --- src/backend/replication/walreceiver.c | 16 ++++++++-------- src/backend/utils/adt/timestamp.c | 20 ++++++++++++++++++++ src/include/utils/timestamp.h | 2 ++ 3 files changed, 30 insertions(+), 8 deletions(-) diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index f6446da2d6..18c66c0c63 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -445,7 +445,7 @@ WalReceiverMain(void) pgsocket wait_fd = PGINVALID_SOCKET; int rc; TimestampTz nextWakeup; - long nap; + int64 nap; /* * Exit walreceiver if we're not in recovery. This should not @@ -530,7 +530,7 @@ WalReceiverMain(void) /* Calculate the nap time, clamping as necessary. */ now = GetCurrentTimestamp(); - nap = TimestampDifferenceMilliseconds(now, nextWakeup); + nap = TimestampDifferenceMicroseconds(now, nextWakeup); /* * Ideally we would reuse a WaitEventSet object repeatedly @@ -544,12 +544,12 @@ WalReceiverMain(void) * avoiding some system calls. */ Assert(wait_fd != PGINVALID_SOCKET); - rc = WaitLatchOrSocket(MyLatch, - WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE | - WL_TIMEOUT | WL_LATCH_SET, - wait_fd, - nap, - WAIT_EVENT_WAL_RECEIVER_MAIN); + rc = WaitLatchOrSocketUs(MyLatch, + WL_EXIT_ON_PM_DEATH | WL_SOCKET_READABLE | + WL_TIMEOUT | WL_LATCH_SET, + wait_fd, + nap, + WAIT_EVENT_WAL_RECEIVER_MAIN); if (rc & WL_LATCH_SET) { ResetLatch(MyLatch); diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c index de93db89d4..52f6568397 100644 --- a/src/backend/utils/adt/timestamp.c +++ b/src/backend/utils/adt/timestamp.c @@ -1719,6 +1719,26 @@ TimestampDifferenceMilliseconds(TimestampTz start_time, TimestampTz stop_time) return (long) ((diff + 999) / 1000); } +/* + * TimestampDifferenceMicroseconds -- convert the difference between two + * timestamps into microseconds + * + * Compute a wait time for WaitLatchUs(). + */ +int64 +TimestampDifferenceMicroseconds(TimestampTz start_time, TimestampTz stop_time) +{ + TimestampTz diff; + + /* Deal with zero or negative elapsed time quickly. */ + if (start_time >= stop_time) + return 0; + /* To not fail with timestamp infinities, we must detect overflow. */ + if (pg_sub_s64_overflow(stop_time, start_time, &diff)) + return PG_INT64_MAX; + return diff; +} + /* * TimestampDifferenceExceeds -- report whether the difference between two * timestamps is >= a threshold (expressed in milliseconds) diff --git a/src/include/utils/timestamp.h b/src/include/utils/timestamp.h index edd59dc432..1caa15221d 100644 --- a/src/include/utils/timestamp.h +++ b/src/include/utils/timestamp.h @@ -100,6 +100,8 @@ extern void TimestampDifference(TimestampTz start_time, TimestampTz stop_time, long *secs, int *microsecs); extern long TimestampDifferenceMilliseconds(TimestampTz start_time, TimestampTz stop_time); +extern int64 TimestampDifferenceMicroseconds(TimestampTz start_time, + TimestampTz stop_time); extern bool TimestampDifferenceExceeds(TimestampTz start_time, TimestampTz stop_time, int msec); -- 2.39.2