On 2025/06/08 23:33, Xuneng Zhou wrote:
Hi hackers, This patch implements progressive backoff in XactLockTableWait() and ConditionalXactLockTableWait(). As Kevin reported in this thread [1], XactLockTableWait() can enter a tight polling loop during logical replication slot creation on standby servers, sleeping for fixed 1ms intervals that can continue for a long time. This creates significant CPU overhead. The patch implements a time-based threshold approach based on Fujii’s idea [1]: keep sleeping for 1ms until the total sleep time reaches 10 seconds, then start exponential backoff (doubling the sleep duration each cycle) up to a maximum of 10 seconds per sleep. This balances responsiveness for normal operations (which typically complete within seconds) against CPU efficiency for the long waits in some logical replication scenarios.
Thanks for the patch! When I first suggested this idea, I used 10s as an example for the maximum sleep time. But thinking more about it now, 10s might be too long. Even if the target transaction has already finished, XactLockTableWait() could still wait up to 10 seconds, which seems excessive. What about using 1s instead? That value is already used as a max sleep time in other places, like WaitExceedsMaxStandbyDelay(). If we agree on 1s as the max, then using exponential backoff from 1ms to 1s after the threshold might not be necessary. It might be simpler and sufficient to just sleep for 1s once we hit the threshold. Based on that, I think a change like the following could work well. Thought? ---------------------------------------- XactLockTableWaitInfo info; ErrorContextCallback callback; bool first = true; + int left_till_hibernate = 5000; <snip> if (!first) { CHECK_FOR_INTERRUPTS(); - pg_usleep(1000L); + + if (left_till_hibernate > 0) + { + pg_usleep(1000L); + left_till_hibernate--; + } + else + pg_usleep(1000000L); ---------------------------------------- Regards, -- Fujii Masao NTT DATA Japan Corporation