On Fri, Sep 14, 2018 at 12:57 PM Michael Paquier <mich...@paquier.xyz> wrote: > > On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote: > > /* > > * Properly accept or ignore signals the postmaster might send us. > > */ > > - pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */ > > + pqsignal(SIGHUP, SIG_IGN); /* ignore reload config */ > > > > I am finally coming back to this patch set, and that's one of the first > > things I am going to help moving on for this CF. And this bit from the > > last patch series is not acceptable as there are some parameters which > > are used by the startup process which can be reloaded. One of them is > > wal_retrieve_retry_interval for tuning when fetching WAL at recovery. > > So, I have been working on this problem again and I have reviewed the > thread, and there have been many things discussed in the last couple of > months: > 1) We do not want to initialize XLogInsert stuff unconditionally for all > processes at the moment recovery begins, but we just want to initialize > it once WAL write is open for business. > 2) Both the checkpointer and the startup process can call > UpdateFullPageWrites() which can cause Insert->fullPageWrites to get > incorrect values.
Can you share the steps to reproduce this problem? > 3) We do not want a palloc() in a critical section because of > RecoveryinProgress being called. > > And the root issue here is 2), because the checkpointer tries to update > Insert->fullPageWrites but it does not need to do so until recovery has > been finished. So in order to fix the original issue I am proposing a > simple fix: let's make sure that the checkpointer does not update > Insert->fullPageWrites until recovery finishes, and let's have the > startup process do the first update once it finishes recovery and > inserts by itself the XLOG_PARAMETER_CHANGE. This way the order of > events is purely sequential and we don't need to worry about having the > checkpointer and the startup process eat on each other's plate because > the checkpointer would only try to work on updating the shared memory > value of full_page_writes once SharedRecoveryInProgress is switched to > true, and that happens after the startup process does its initial call > to UpdateFullPageWrites(). I have improved as well all the comments > around to make clear the behavior wanted. > > Thoughts? > UpdateFullPageWrites(void) { XLogCtlInsert *Insert = &XLogCtl->Insert; + /* + * Check if recovery is still in progress before entering this critical + * section, as some memory allocation could happen at the end of + * recovery. There is nothing to do for a system still in recovery. + * Note that we need to process things here at the end of recovery for + * the startup process, which is why this checks after InRecovery. + */ + if (RecoveryInProgress() && !InRecovery) + return; + On a regular startup when there is no recovery, it won't allow us to log the WAL record (XLOG_FPW_CHANGE) which can happen without above change. You can check that by setting full_page_writes=off and start the system. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com