At Tue, 9 Feb 2021 09:47:58 +0530, Dilip Kumar <dilipbal...@gmail.com> wrote in > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA <nag...@sraoss.co.jp> wrote: > > > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST) > > Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > > > > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar <dilipbal...@gmail.com> > > > wrote in > > > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA <nag...@sraoss.co.jp> wrote: > > > > > > > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > > > > Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > > > > > > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA > > > > > > <nag...@sraoss.co.jp> wrote in > > > > > > > > > > I think the right fix should be that the state should never > > > > > > > > > > go from > > > > > > > > > > ‘paused’ to ‘pause requested’ so I think > > > > > > > > > > pg_wal_replay_pause should take > > > > > > > > > > care of that. > > > > > > > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, > > > > > > > > > but I wonder > > > > > > > > > it can not handle the case that a user resume and pause again > > > > > > > > > while a sleep. > > > > > > > > > > > > > > > > Right, we will have to check and set in the loop. But we > > > > > > > > should not > > > > > > > > allow the state to go from paused to pause requested > > > > > > > > irrespective of > > > > > > > > this. > > > > > > > > > > > > > > I agree with you. > > > > > > > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming > > > > > > we > > > > > > immediately change the state to PAUSE always we see REQUESTED in the > > > > > > waiting loop, despite that we allow change the state from PAUSE to > > > > > > REQUESTED via NOT_PAUSED between two successive loop condition > > > > > > checks? > > > > > > > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > > > > observe 'pause requested' during a sleep alghough the time window is > > > > > short. > > > > > It seems a bit odd that pg_wal_replay_pause changes the state like > > > > > this > > > > > because This state meeans that recovery may not be 'paused'. > > > > > > > > Yeah, this appears wrong that after 'paused' we go back to 'pause > > > > requested'. the logical state transition should always be as below > > > > > > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > > > > request and then paused but there is nothing wrong with going to > > > > paused) > > > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or > > > > get paused) > > > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > > > > PAUSE_REQUESTED without going to NOT PAUSED) > > > > > > I didn't asked about the internal logical correctness, but asked about > > > *actual harm* revealed to users. I don't see any actual harm in the > > > "wrong" transition because: > > > > Actually, the incorrect state transition is not so harmful except that > > users can observe unnecessary state changes. However, I don't think any > > actual harm in prohibit the incorrect state transition. So, I think we > > can do it. > > > > > If we are going to introduce that complexity, I'd like to re-propose > > > to introduce interlocking between the recovery side and the > > > pause-requestor side instead of introducing the intermediate state, > > > which is the cause of the complexity. > > > > > > The attached PoC patch adds: > > > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a > > > info_lck since the check is done in the existing lock section. > > > > > > - Interlocking between the above and SetRecoveryPause without adding a > > > shared variable. > > > (This is what I called "synchronous" before.) > > > > I think waiting in pg_wal_replay_pasue is a possible option, but this will > > also introduce other complexity to codes such as possibility of waiting for > > long or for ever. For example, waiting in SetRecoveryPause as in your POC > > patch appears to make recovery stuck in RecoveryRequiresIntParameter.
Ah. Yes, startup process does not need to wait. That is a bug of the patch. No other callers don't cause the self dead lock. > I agree with this, I think we previously discussed these approaches > where we can wait in pg_wal_replay_pasue() or > pg_is_wal_replay_pasued(). In fact, we had an older version where we > put the wait in pg_is_wal_replay_pasued(). But it appeared that doing Note that the expected waiting period is while calling rmgr_redo(). If it is stuck for a long time, that suggests something's going wrong. > so will add extra complexity as well as instead of waiting in these > APIs the wait logic can be implemented in the application code which > is actually using these APIs and IMHO that will give better control to > the users. Year, with the PoC pg_wal_replay_pause() can make a short wait as a side-effect but the tri-state patch also can add a function to wait for the state suffices. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8e3b5df7dc..194a2f9998 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6076,6 +6076,23 @@ void SetRecoveryPause(bool recoveryPause) { SpinLockAcquire(&XLogCtl->info_lck); + + /* + * Wait for the application of the record being applied to finish, so that + * no records will be applied after this function returns. We don't need to + * wait when ending a pause. Anyway we are requesting a recovery pause, we + * don't mind a possible slow down of recovery by the info_lck here. + * We don't need to wait in the startup process. + */ + while(InRecovery && + recoveryPause && !XLogCtl->recoveryPause && + XLogCtl->replayEndRecPtr != XLogCtl->lastReplayedEndRecPtr) + { + SpinLockRelease(&XLogCtl->info_lck); + CHECK_FOR_INTERRUPTS(); + pg_usleep(10000L); /* 10 ms */ + SpinLockAcquire(&XLogCtl->info_lck); + } XLogCtl->recoveryPause = recoveryPause; SpinLockRelease(&XLogCtl->info_lck); } @@ -7262,6 +7279,7 @@ StartupXLOG(void) do { bool switchedTLI = false; + bool pause_requested = false; #ifdef WAL_DEBUG if (XLOG_DEBUG || @@ -7292,11 +7310,9 @@ StartupXLOG(void) * Note that we intentionally don't take the info_lck spinlock * here. We might therefore read a slightly stale value of * the recoveryPause flag, but it can't be very stale (no - * worse than the last spinlock we did acquire). Since a - * pause request is a pretty asynchronous thing anyway, - * possibly responding to it one WAL record later than we - * otherwise would is a minor issue, so it doesn't seem worth - * adding another spinlock cycle to prevent that. + * worse than the last spinlock we did acquire). We eventually + * make sure catching the pause request if any just before + * applying this record. */ if (((volatile XLogCtlData *) XLogCtl)->recoveryPause) recoveryPausesHere(false); @@ -7385,12 +7401,19 @@ StartupXLOG(void) /* * Update shared replayEndRecPtr before replaying this record, * so that XLogFlush will update minRecoveryPoint correctly. + * Also we check for the correct value of the recoveryPause + * flag here not to have redo overrun during a pause. See + * SetRecoveryPuase() for details. */ SpinLockAcquire(&XLogCtl->info_lck); XLogCtl->replayEndRecPtr = EndRecPtr; XLogCtl->replayEndTLI = ThisTimeLineID; + pause_requested = XLogCtl->recoveryPause; SpinLockRelease(&XLogCtl->info_lck); + if (pause_requested) + recoveryPausesHere(false); + /* * If we are attempting to enter Hot Standby mode, process * XIDs we see