Hi Andres,

Thanks for looking into this!

On Tue, Jun 17, 2025 at 10:17 PM Andres Freund <and...@anarazel.de> wrote:

> Hi,
>
> On 2025-06-08 22:33:39 +0800, Xuneng Zhou wrote:
> > This patch implements progressive backoff in XactLockTableWait() and
> > ConditionalXactLockTableWait().
> >
> > As Kevin reported in this thread [1], XactLockTableWait() can enter a
> > tight polling loop during logical replication slot creation on standby
> > servers, sleeping for fixed 1ms intervals that can continue for a long
> > time. This creates significant CPU overhead.
> >
> > The patch implements a time-based threshold approach based on Fujii’s
> > idea [1]: keep sleeping for 1ms until the total sleep time reaches 10
> > seconds, then start exponential backoff (doubling the sleep duration
> > each cycle) up to a maximum of 10 seconds per sleep. This balances
> > responsiveness for normal operations (which typically complete within
> > seconds) against CPU efficiency for the long waits in some logical
> > replication scenarios.
>
> ISTM that this is going to wrong way - the real problem is that we seem to
> have extended periods where XactLockTableWait() doesn't actually work, not
> that the sleep time is too short.


Yeah, XactLockTableWait() works well when the targeted XID lock exists and
can be waited for. However, this assumption breaks down on hot standby
because transactions from the primary aren't running locally—no exclusive
locks are registered in the standby's lock table, leading to potentially
unbounded polling instead of proper blocking waits.

The purpose of XactLockTableWait() is to wait for a specified transaction
to commit or abort, which remains semantically correct for
SnapBuildWaitSnapshot() during logical decoding. How about adding special
handling for the hot standby case within XactLockTableWait() by detecting
RecoveryInProgres(). Although after studying the various calls of
XactLockTableWait(), I'm uncertain whether this condition is proper and
sufficient to avoid affecting other use cases.

The sleep in XactLockTableWait() was

intended to address a very short race, not something that's essentially
> unbound.
>

The sleep in function is used to address the possible race between txn in
ProcArray and locktable that can occur in
building snapshots for logical decoding. But in the hot standby case, the
race does not exist and it becomes a polling.

Best regards,
Xuneng

Reply via email to