Hi Fujii, Thanks for the review.
> Just an idea: how about calling pgstat_report_wait_start() and _end() around pg_usleep(), similar to what WaitExceedsMaxStandbyDelay() does? I was thinking about this as well, but the question is what do we report the wait event here as? The one that makes sense to me is PG_WAIT_LOCK, but that wouldn't align with what pg_locks would display since there is no actual lock grabbed on the standby. > Also, would waiting only 1ms per loop cycle be too aggressive? I agree that it's aggressive for this case. But XactLockTableWait/ConditionalXactLockTableWait seem to be used in other places including in heapam so I'm hesitant on changing this behaviour for all of them. Should we have a different "wait logic" for this case? Kevin On Mon, May 26, 2025 at 9:02 AM Fujii Masao <masao.fu...@oss.nttdata.com> wrote: > > > On 2025/05/24 5:41, Kevin K Biju wrote: > > Hi, > > > > While creating a logical replication slot, we wait for older > transactions to complete to reach a "consistent point", which can take a > while on busy databases. If we're creating a slot on a primary instance, > it's pretty clear that we're waiting on a transaction: > > > > > > postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM > pg_stat_activity WHERE pid=804; > > pid | wait_event_type | wait_event | state | > query > > > -----+-----------------+---------------+--------+---------------------------------------------------------------- > > 804 | Lock | transactionid | active | SELECT > pg_create_logical_replication_slot('wow1', 'pgoutput'); > > (1 row) > > postgres=# SELECT locktype,transactionid,pid,mode FROM pg_locks WHERE > pid=804 AND granted='f'; > > locktype | transactionid | pid | mode > > ---------------+---------------+-----+----------- > > transactionid | 6077 | 804 | ShareLock > > > > However, creating a slot on a hot standby behaves very differently when > blocked on a transaction. Querying pg_stat_activity doesn't give us any > indication on what the issue is: > > > > > > postgres=# SELECT pg_is_in_recovery(); > > pg_is_in_recovery > > ------------------- > > t > > (1 row) > > postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM > pg_stat_activity WHERE pid=5074; > > pid | wait_event_type | wait_event | state | > query > > > ------+-----------------+------------+--------+---------------------------------------------------------------- > > 5074 | | | active | SELECT > pg_create_logical_replication_slot('wow1', 'pgoutput'); > > > > And more importantly, a backend in this state cannot be terminated via > the usual methods and sometimes requires a server restart. > > > > > > postgres=# SELECT pg_cancel_backend(5074), pg_terminate_backend(5074); > > pg_cancel_backend | pg_terminate_backend > > -------------------+---------------------- > > t | t > > (1 row) > > postgres=# SELECT pid,wait_event_type,wait_event,state,query FROM > pg_stat_activity WHERE pid=5074; > > pid | wait_event_type | wait_event | state | > query > > > ------+-----------------+------------+--------+---------------------------------------------------------------- > > 5074 | | | active | SELECT > pg_create_logical_replication_slot('wow1', 'pgoutput'); > > > > I've taken a look at the code that "awaits" on transactions and the > function of interest is lmgr.c/XactLockTableWait. On a primary, it is able > to acquire a ShareLock on the xid and the lock manager does a good job on > making this wait efficient as well as visible externally. However, on a > standby, it cannot wait on the xid since it is not running the transaction. > However, it knows the transaction is still running from KnownAssignedXids, > and then ends up on this codepath: > > > > * > > * Some uses of this function don't involve tuple visibility -- such > > * as when building snapshots for logical decoding. It is possible to > > * see a transaction in ProcArray before it registers itself in the > > * locktable. The topmost transaction in that case is the same xid, > > * so we try again after a short sleep. (Don't sleep the first time > > * through, to avoid slowing down the normal case.) > > */ > > if (!first) > > pg_usleep(1000L); > > > > The attached comment suggests that this sleep was only meant to be hit a > few times while we wait for the lock to appear so we can wait on it. > However, in the hot standby case, this ends up becoming a polling loop > since the lock will never appear. There is no interrupt processing in this > loop either and so the only way out is to wait for the transaction on the > primary to complete. > > I agree with your analysis. It makes sense to me. > > > > Attached is a patch that adds CHECK_FOR_INTERRUPTS before the sleep in > XactLockTableWait and ConditionalXactLockTableWait. This allows backends > waiting on a transaction during slot creation on a standby to be > interrupted. > > +1 > > > It would also be nice if there was a way for users to tell what we're > waiting on (maybe a different wait event?) but I'd like input on that. > > Just an idea: how about calling pgstat_report_wait_start() and _end() > around > pg_usleep(), similar to what WaitExceedsMaxStandbyDelay() does? > Since this is more of an improvement than a bug fix, I think > it would be better to make it as a separate patch from the CFI addition. > > > Also, would waiting only 1ms per loop cycle be too aggressive? If so, maybe > we could increase the sleep time gradually during recovery (but not beyond > 1 second), again similar to WaitExceedsMaxStandbyDelay(). > > Regards, > > -- > Fujii Masao > NTT DATA Japan Corporation > >