On Thu, Sep 06, 2018 at 04:37:28PM -0700, Michael Paquier wrote:
> /*
>  * Properly accept or ignore signals the postmaster might send us.
>  */
> -       pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */
> +       pqsignal(SIGHUP, SIG_IGN);  /* ignore reload config */
> 
> I am finally coming back to this patch set, and that's one of the first
> things I am going to help moving on for this CF.  And this bit from the
> last patch series is not acceptable as there are some parameters which
> are used by the startup process which can be reloaded.  One of them is
> wal_retrieve_retry_interval for tuning when fetching WAL at recovery.

So, I have been working on this problem again and I have reviewed the
thread, and there have been many things discussed in the last couple of
months:
1) We do not want to initialize XLogInsert stuff unconditionally for all
processes at the moment recovery begins, but we just want to initialize
it once WAL write is open for business.
2) Both the checkpointer and the startup process can call
UpdateFullPageWrites() which can cause Insert->fullPageWrites to get
incorrect values.
3) We do not want a palloc() in a critical section because of
RecoveryinProgress being called.

And the root issue here is 2), because the checkpointer tries to update
Insert->fullPageWrites but it does not need to do so until recovery has
been finished.  So in order to fix the original issue I am proposing a
simple fix: let's make sure that the checkpointer does not update
Insert->fullPageWrites until recovery finishes, and let's have the
startup process do the first update once it finishes recovery and
inserts by itself the XLOG_PARAMETER_CHANGE.  This way the order of
events is purely sequential and we don't need to worry about having the
checkpointer and the startup process eat on each other's plate because
the checkpointer would only try to work on updating the shared memory
value of full_page_writes once SharedRecoveryInProgress is switched to
true, and that happens after the startup process does its initial call
to UpdateFullPageWrites().  I have improved as well all the comments
around to make clear the behavior wanted.

Thoughts?
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3025d0badb..69912e6a22 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7719,6 +7719,11 @@ StartupXLOG(void)
 	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
 	 * record before resource manager writes cleanup WAL records or checkpoint
 	 * record is written.
+	 *
+	 * It is safe to check the shared full_page_writes without the lock,
+	 * because there is no concurrently running process able to update it.
+	 * The only other process able to update full_page_writes is the
+	 * checkpointer, still it is unable to do so until recovery finishes.
 	 */
 	Insert->fullPageWrites = lastFullPageWrites;
 	LocalSetXLogInsertAllowed();
@@ -9693,14 +9698,27 @@ XLogReportParameters(void)
  * Update full_page_writes in shared memory, and write an
  * XLOG_FPW_CHANGE record if necessary.
  *
- * Note: this function assumes there is no other process running
- * concurrently that could update it.
+ * Note: this can be called from the checkpointer, or the startup process
+ * at the end of recovery.  One could think that this routine should be
+ * careful with its lock handling, however this is a no-op for the
+ * checkpointer until the startup process marks the end of recovery,
+ * so only one of them can do the work of this routine at once.
  */
 void
 UpdateFullPageWrites(void)
 {
 	XLogCtlInsert *Insert = &XLogCtl->Insert;
 
+	/*
+	 * Check if recovery is still in progress before entering this critical
+	 * section, as some memory allocation could happen at the end of
+	 * recovery.  There is nothing to do for a system still in recovery.
+	 * Note that we need to process things here at the end of recovery for
+	 * the startup process, which is why this checks after InRecovery.
+	 */
+	if (RecoveryInProgress() && !InRecovery)
+		return;
+
 	/*
 	 * Do nothing if full_page_writes has not been changed.
 	 *
@@ -9731,7 +9749,7 @@ UpdateFullPageWrites(void)
 	 * Write an XLOG_FPW_CHANGE record. This allows us to keep track of
 	 * full_page_writes during archive recovery, if required.
 	 */
-	if (XLogStandbyInfoActive() && !RecoveryInProgress())
+	if (XLogStandbyInfoActive())
 	{
 		XLogBeginInsert();
 		XLogRegisterData((char *) (&fullPageWrites), sizeof(bool));

Attachment: signature.asc
Description: PGP signature

Reply via email to