Hello. At Thu, 12 Apr 2018 14:07:53 +0900, Michael Paquier <mich...@paquier.xyz> wrote in <20180412050753.ga19...@paquier.xyz> > On Thu, Apr 12, 2018 at 10:34:30AM +0900, Kyotaro HORIGUCHI wrote: > > Checkpointer never calls CreateCheckPoint while > > RecoveryInProgress() == true. In other words, checkpointer is not > > an updator of shared FPW at the time StartupXLOG calls > > CreateCheckPoint for fallback_promote. > > I have been able to spend a couple of hours on your patch, wrapping my > mind on your stuff. So what I had in mind was something like this type > of scenario:
Thank for the precise explanation. The scenario that CreateCheckPoint is called simultaneously in different prcoesses seems broken by itself in the first place but I put that aside for now. > 1) The startup process requires a restart point. > 2) The checkpointer receives the request, and blocks before reading > RecoveryInProgress(). RecoveryInProgress doesn't take lock. But I assume here that checkpointer is taking a long time after entering RecoveryInProgress and haven't actually read SharedRecoveryInProgress. > 3) A fallback_promote is triggered, making the startup process call > CreateCheckpoint(). I'm confused here. It seems to me that StartupXLOG calls CreateCheckPoint only in bootstrap or standalone cases. No concurrent processe is running in the cases. Even if CreateCheckPoint is called there in the IsUnderPostmater case, checkpointer never calls CreateCheckPoint during the checkpoint. This is safe in regard to shared FPW. (but checkpoint is being blocked in this scenario.) Assuming that RequestCheckpoint() is called here, the request is merged with the previous request above and the function returns after the checkpoint ends. (That is, checkpointer must continue to run in this case.) > 4) Startup process finishes checkpoint, updates Insert->fullPageWrites. According to this scenario, checkpionter is still stalling now. So it is not a concurrent update. > 5) Checkpoint reads RecoveryInProgress to false, moves on with its > checkpoint. If checkpointer sees SharedRecoveryInProgress being false, Create(or Request)CheckPoint called in (3) must have finished and StartupXLOG() no longer updates shared FPW. There's no concurrent update. > > The comment may be somewhat confusing that it is written > > there. The point is that checkpointer and StartupXLOG are > > mutually excluded on updating shared FPW by > > SharedRecoveryInProgress flag. > > Indeed. I can see that it is the main key point of the patch. > > > | * If full_page_writes has been turned off, issue XLOG_FPW_CHANGE before > > | * the flag actually takes effect. Checkpointer never calls this function > > | * before StartupXLOG() turns off SharedRecoveryInProgress so there's no > > | * window where checkpointer and startup processes - the only updators of > > | * the flag - can update shared FPW simultaneously. Thus no lock is > > | * required here. Both shared and local fullPageWrites do not change > > | * before the next reading below. > > Yeah, this reduces the confusion. Thanks^^; > (The latest patch is a mix of two patches.) Sorry I counld get this. > + The default is <literal>on</literal>. The change of the parmeter > takes > + effect at the next checkpoint time. > s/parmeter/parameter/ > > By the way, I would vote for keeping track in WAL of full_page_writes > switched from off to on. This is not used in the backend, but that's > still useful for debugging end-user issues. Agreed and I tried that. The problem on that is that some records can be written after REDO point before XLOG_FPW_CHANGE(true) is written. However this is no problem for the FPW-related stuff to work properly (since no one looks it), the FPW record suggests that the current checkpoint loses FPI in the first several records. This has a far larger impact with this patch because shared FPW is always turned on just at REDO point. So I choosed not to write XLOG_FPW_CHANGE(false) rather than writing bogus records. > Actually, I was wondering why a spin lock is not taken in > RecoveryInProgress when reading SharedRecoveryInProgress, but that's > from 1a3d1044 which added a memory barrier instead as guarantee... Maybe it doesn't need barrier, since the flag is initialized as true and becomes false just once and delay in reading by other processes doesn't no harm. I think that bool doesn't suffer atomicity. Even all these are true, some description is needed. regards. -- Kyotaro Horiguchi NTT Open Source Software Center