On Mon, Mar 26, 2018 at 02:32:22PM +0530, Dilip Kumar wrote: > On Fri, Mar 23, 2018 at 1:19 PM, Michael Paquier <mich...@paquier.xyz> > wrote: > In StartupXLOG, just before the CreateCheckPoint() call, we are calling > LocalSetXLogInsertAllowed(). So, I am thinking can we just remove > InitXLogInsert from CreateCheckpoint. And, we don't need to move it to > bootstrap.c. Or am I missing something?
I have finally been able to spend more time on this issue, and checked for a couple of hours all the calls to RecoveryInProgress() that could be triggered within a critical section to see if the move I suggested previously is worth it ot not as this would cost some memory for standbys all the time, which would suck for many read-only sessions. There are a couple of calls potentially happening in critical sections, however except for the one in UpdateFullPageWrites() those are used for sanity checks in code paths that should never trigger it, take XLogInsertBegin() for example. So with assertions this would trigger a failure before the real elog(ERROR) message shows up. Hence, I am changing opinion still I think that instead of the patch you proposed first we could just do a call to InitXLogInsert() before entering the critical section. This is also more consistent with what CreateCheckpoint() does. That's also less risky for a backpatch as your patch increases the window between the beginning of the critical section and the real moment where the check for RecoveryInProgress is needed. A comment explaining why the initialization needs to happen is also essential. All in all, this would give the simple patch attached. Thoughts? -- Michael
From 0fc3878375890aef5c0a6dc57d73c30aa97c88df Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Tue, 27 Mar 2018 16:38:37 +0900 Subject: [PATCH] Fix WAL insert when updating full_page_writes for checkpointer At startup or when receiving a SIGHUP signal to update its configuration, the checkpointer may need to update the shared memory for full_page_writes and needs to write a WAL record about it. However, just after a promotion, there is a small window where it is possible that the memory context in charge of building WAL records has not been correctly initialized, leading to a crash of the checkpointer. With assertions enabled, this causes a failure when allocating memory in a critical section. And this fails when trying to build the FPW record without assertions. Report and patch by Dilip Kumar, reviewed by Michael Paquier. Discussion: https://postgr.es/m/CAFiTN-u4BA8KXcQUWDPNgaKAjDXC%3DC2whnzBM8TAcv%3DstckYUw%40mail.gmail.com --- src/backend/access/transam/xlog.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index cb9c2a29cb..aae1131977 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9543,6 +9543,16 @@ UpdateFullPageWrites(void) if (fullPageWrites == Insert->fullPageWrites) return; + /* + * Initialize InitXLogInsert working areas before entering the critical + * section. Normally, this is done by the first call to + * RecoveryInProgress() or LocalSetXLogInsertAllowed(), but it can + * be possible that this has not been initialized yet for a checkpointer + * which updates the value of full_page_writes for a received SIGHUP or + * at process startup. + */ + InitXLogInsert(); + START_CRIT_SECTION(); /* -- 2.16.3
signature.asc
Description: PGP signature