On Tue, Mar 2, 2021 at 5:52 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Fri, Feb 19, 2021 at 5:43 PM Amul Sul <sula...@gmail.com> wrote: > > > > In the attached version I have made the changes accordingly what Robert has > > summarised in his previous mail[1]. > > > > In addition to that, I also move the code that updates the control file to > > XLogAcceptWrites() which will also get skipped when the system is read-only > > (wal > > prohibited). The system will be in the crash recovery, and that will > > change once we do the end-of-recovery checkpoint and the WAL writes > > operation > > which we were skipping from startup. The benefit of keeping the system in > > recovery mode is that it fixes my concern[2] where other backends could > > connect > > and write wal records while we were changing the system to read-write. Now, > > no > > other backends allow a wal write; UpdateFullPageWrites(), end-of-recovery > > checkpoint, and XLogReportParameters() operations will be performed in the > > same > > sequence as it is in the startup while changing the system to read-write. > > I was looking into the changes espcially recovery related problem, I > have a few questions > > 1. > +static bool > +XLogAcceptWrites(bool needChkpt, bool bgwriterLaunched, > + bool localPromoteIsTriggered, XLogReaderState *xlogreader, > + bool archiveRecoveryRequested, TimeLineID endOfLogTLI, > + XLogRecPtr endOfLog, TimeLineID thisTimeLineID) > +{ > + bool promoted = false; > + > + /* > ..... > + if (localPromoteIsTriggered) > { > - checkPointLoc = ControlFile->checkPoint; > + XLogRecord *record; > > ... > + record = ReadCheckpointRecord(xlogreader, > + ControlFile->checkPoint, > + 1, false); > if (record != NULL) > { > promoted = true; > ... > CreateEndOfRecoveryRecord(); > } > > Why do we need to move promote related code in XLogAcceptWrites? > IMHO, this promote related handling should be in StartupXLOG only.
XLogAcceptWrites() tried to club all the WAL write operations that happen at the end of StartupXLOG(). The only exception is that promotion checkpoint. > That will look cleaner. I think it would be better to move the promotion checkpoint call inside XLogAcceptWrites() as well. So that we can say XLogAcceptWrites() is a part of StartupXLOG() does the required WAL writes. Thoughts? > > > > > 1] > > http://postgr.es/m/CA+TgmoZ=cctbaxxmtyzogxegqzoz9smkbwrdpsacpjvfcgc...@mail.gmail.com > > 2] > > http://postgr.es/m/caaj_b97xx-nqrym_uxzecph9asgomrordnhrg1n51fdcdwo...@mail.gmail.com > > 2. > I did not clearly understand your concern in point [2], because of > which you have to postpone RECOVERY_STATE_DONE untill system is set > back to read-write. Can you explain this? > Sure, for that let me explain how this transition to read-write occurs. When a backend executes a function (ie. pg_prohibit_wal(false)) to make the system read-write then that system state changes will be conveyed by the Checkpointer process to all existing backends using global barrier and while Checkpointer in the progress of conveying this barrier, any existing backends who might have absorbed barriers can write new records. We don't want that to happen in cases where previous recovery-end-checkpoint is skipped in startup. We want Checkpointer first to convey the barrier to all backends but, the backend shouldn't write wal until the Checkpointer writes recovery-end-checkpoint record. To refrain these backends from writing WAL I think we should keep the server in crash recovery mode until UpdateFullPageWrites(), end-of-recovery-checkpoint, and XLogReportParameters() are performed. Regards, Amul