Hi Fujii,

Thanks for the review.

> Just an idea: how about calling pgstat_report_wait_start() and _end()
around pg_usleep(), similar to what WaitExceedsMaxStandbyDelay() does?

I was thinking about this as well, but the question is what do we report
the wait event here as? The one that makes sense to me is PG_WAIT_LOCK, but
that wouldn't align with what pg_locks would display since there is no
actual lock grabbed on the standby.

> Also, would waiting only 1ms per loop cycle be too aggressive?

I agree that it's aggressive for this case. But
XactLockTableWait/ConditionalXactLockTableWait seem to be used in other
places including in heapam so I'm hesitant on changing this behaviour for
all of them. Should we have a different "wait logic" for this case?

Kevin

On Mon, May 26, 2025 at 9:02 AM Fujii Masao <masao.fu...@oss.nttdata.com>
wrote:

>
>
> On 2025/05/24 5:41, Kevin K Biju wrote:
> > Hi,
> >
> > While creating a logical replication slot, we wait for older
> transactions to complete to reach a "consistent point", which can take a
> while on busy databases. If we're creating a slot on a primary instance,
> it's pretty clear that we're waiting on a transaction:
> >
> >
> > postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM
> pg_stat_activity WHERE pid=804;
> >   pid | wait_event_type |  wait_event   | state  |
>       query
> >
> -----+-----------------+---------------+--------+----------------------------------------------------------------
> >   804 | Lock            | transactionid | active | SELECT
> pg_create_logical_replication_slot('wow1', 'pgoutput');
> > (1 row)
> > postgres=# SELECT locktype,transactionid,pid,mode FROM pg_locks WHERE
> pid=804 AND granted='f';
> >     locktype    | transactionid | pid |   mode
> > ---------------+---------------+-----+-----------
> >   transactionid |          6077 | 804 | ShareLock
> >
> > However, creating a slot on a hot standby behaves very differently when
> blocked on a transaction. Querying pg_stat_activity doesn't give us any
> indication on what the issue is:
> >
> >
> > postgres=# SELECT pg_is_in_recovery();
> >   pg_is_in_recovery
> > -------------------
> >   t
> > (1 row)
> > postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM
> pg_stat_activity WHERE pid=5074;
> >   pid  | wait_event_type | wait_event | state  |
>     query
> >
> ------+-----------------+------------+--------+----------------------------------------------------------------
> >   5074 |                 |            | active | SELECT
> pg_create_logical_replication_slot('wow1', 'pgoutput');
> >
> > And more importantly, a backend in this state cannot be terminated via
> the usual methods and sometimes requires a server restart.
> >
> >
> > postgres=# SELECT pg_cancel_backend(5074), pg_terminate_backend(5074);
> >   pg_cancel_backend | pg_terminate_backend
> > -------------------+----------------------
> >   t                 | t
> > (1 row)
> > postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM
> pg_stat_activity WHERE pid=5074;
> >   pid  | wait_event_type | wait_event | state  |
>     query
> >
> ------+-----------------+------------+--------+----------------------------------------------------------------
> >   5074 |                 |            | active | SELECT
> pg_create_logical_replication_slot('wow1', 'pgoutput');
> >
> > I've taken a look at the code that "awaits" on transactions and the
> function of interest is lmgr.c/XactLockTableWait. On a primary, it is able
> to acquire a ShareLock on the xid and the lock manager does a good job on
> making this wait efficient as well as visible externally. However, on a
> standby, it cannot wait on the xid since it is not running the transaction.
> However, it knows the transaction is still running from KnownAssignedXids,
> and then ends up on this codepath:
> >
> > *
> > * Some uses of this function don't involve tuple visibility -- such
> > * as when building snapshots for logical decoding.  It is possible to
> > * see a transaction in ProcArray before it registers itself in the
> > * locktable.  The topmost transaction in that case is the same xid,
> > * so we try again after a short sleep.  (Don't sleep the first time
> > * through, to avoid slowing down the normal case.)
> > */
> > if (!first)
> > pg_usleep(1000L);
> >
> > The attached comment suggests that this sleep was only meant to be hit a
> few times while we wait for the lock to appear so we can wait on it.
> However, in the hot standby case, this ends up becoming a polling loop
> since the lock will never appear. There is no interrupt processing in this
> loop either and so the only way out is to wait for the transaction on the
> primary to complete.
>
> I agree with your analysis. It makes sense to me.
>
>
> > Attached is a patch that adds CHECK_FOR_INTERRUPTS before the sleep in
> XactLockTableWait and ConditionalXactLockTableWait. This allows backends
> waiting on a transaction during slot creation on a standby to be
> interrupted.
>
> +1
>
> > It would also be nice if there was a way for users to tell what we're
> waiting on (maybe a different wait event?) but I'd like input on that.
>
> Just an idea: how about calling pgstat_report_wait_start() and _end()
> around
> pg_usleep(), similar to what WaitExceedsMaxStandbyDelay() does?
> Since this is more of an improvement than a bug fix, I think
> it would be better to make it as a separate patch from the CFI addition.
>
>
> Also, would waiting only 1ms per loop cycle be too aggressive? If so, maybe
> we could increase the sleep time gradually during recovery (but not beyond
> 1 second), again similar to WaitExceedsMaxStandbyDelay().
>
> Regards,
>
> --
> Fujii Masao
> NTT DATA Japan Corporation
>
>

Reply via email to