On Tue, Mar 2, 2021 at 5:52 PM Dilip Kumar <dilipbal...@gmail.com> wrote:
>
> On Fri, Feb 19, 2021 at 5:43 PM Amul Sul <sula...@gmail.com> wrote:
> >
> > In the attached version I have made the changes accordingly what Robert has
> > summarised in his previous mail[1].
> >
> > In addition to that, I also move the code that updates the control file to
> > XLogAcceptWrites() which will also get skipped when the system is read-only 
> > (wal
> > prohibited).  The system will be in the crash recovery, and that will
> > change once we do the end-of-recovery checkpoint and the WAL writes 
> > operation
> > which we were skipping from startup.  The benefit of keeping the system in
> > recovery mode is that it fixes my concern[2] where other backends could 
> > connect
> > and write wal records while we were changing the system to read-write. Now, 
> > no
> > other backends allow a wal write; UpdateFullPageWrites(), end-of-recovery
> > checkpoint, and XLogReportParameters() operations will be performed in the 
> > same
> > sequence as it is in the startup while changing the system to read-write.
>
> I was looking into the changes espcially recovery related problem,  I
> have a few questions
>
> 1.
> +static bool
> +XLogAcceptWrites(bool needChkpt, bool bgwriterLaunched,
> +                 bool localPromoteIsTriggered, XLogReaderState *xlogreader,
> +                 bool archiveRecoveryRequested, TimeLineID endOfLogTLI,
> +                 XLogRecPtr endOfLog, TimeLineID thisTimeLineID)
> +{
> +    bool        promoted = false;
> +
> +    /*
> .....
> +            if (localPromoteIsTriggered)
>              {
> -                checkPointLoc = ControlFile->checkPoint;
> +                XLogRecord *record;
>
> ...
> +                record = ReadCheckpointRecord(xlogreader,
> +                                              ControlFile->checkPoint,
> +                                              1, false);
>                  if (record != NULL)
>                  {
>                      promoted = true;
>                     ...
>                     CreateEndOfRecoveryRecord();
>                 }
>
> Why do we need to move promote related code in XLogAcceptWrites?
> IMHO, this promote related handling should be in StartupXLOG only.

XLogAcceptWrites() tried to club all the WAL write operations that happen at the
end of StartupXLOG(). The only exception is that promotion checkpoint.

> That will look cleaner.

I think it would be better to move the promotion checkpoint call inside
XLogAcceptWrites() as well. So that we can say XLogAcceptWrites() is a part of
StartupXLOG() does the required WAL writes. Thoughts?

>
> >
> > 1] 
> > http://postgr.es/m/CA+TgmoZ=cctbaxxmtyzogxegqzoz9smkbwrdpsacpjvfcgc...@mail.gmail.com
> > 2] 
> > http://postgr.es/m/caaj_b97xx-nqrym_uxzecph9asgomrordnhrg1n51fdcdwo...@mail.gmail.com
>
> 2.
> I did not clearly understand your concern in point [2], because of
> which you have to postpone RECOVERY_STATE_DONE untill system is set
> back to read-write.  Can you explain this?
>

Sure, for that let me explain how this transition to read-write occurs.  When a
backend executes a function (ie. pg_prohibit_wal(false)) to make the system
read-write then that system state changes will be conveyed by the Checkpointer
process to all existing backends using global barrier and while Checkpointer in
the progress of conveying this barrier, any existing backends who might have
absorbed barriers can write new records.

We don't want that to happen in cases where previous recovery-end-checkpoint is
skipped in startup. We want Checkpointer first to convey the barrier to all
backends but, the backend shouldn't write wal until the Checkpointer writes
recovery-end-checkpoint record.

To refrain these backends from writing WAL I think we should keep the server in
crash recovery mode until UpdateFullPageWrites(),
end-of-recovery-checkpoint, and XLogReportParameters() are performed.

Regards,
Amul


Reply via email to