On Tue, May 5, 2020 at 9:51 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Tue, May 5, 2020 at 5:53 AM Bossart, Nathan <bossa...@amazon.com> wrote: > > I believe I've discovered a race condition between the startup and > > checkpointer processes that can cause a CRC mismatch in the pg_control > > file. If a cluster crashes at the right time, the following error > > appears when you attempt to restart it: > > > > FATAL: incorrect checksum in control file > > > > This appears to be caused by some code paths in xlog_redo() that > > update ControlFile without taking the ControlFileLock. The attached > > patch seems to be sufficient to prevent the CRC mismatch in the > > control file, but perhaps this is a symptom of a bigger problem with > > concurrent modifications of ControlFile->checkPointCopy.nextFullXid. > > This does indeed look pretty dodgy. CreateRestartPoint() running in > the checkpointer does UpdateControlFile() to compute a checksum and > write it out, but xlog_redo() processing > XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} modifies that data without > interlocking. It looks like the ancestors of that line were there > since 35af5422f64 (2006), but back then RecoveryRestartPoint() ran > UpdateControLFile() directly in the startup process (immediately after > that update), so no interlocking problem. Then in cdd46c76548 (2009), > RecoveryRestartPoint() was split up so that CreateRestartPoint() ran > in another process.
Here's a version with a commit message added. I'll push this to all releases in a day or two if there are no objections.
From 28ee04b537085148c75f575d2bc755a1fffc19c7 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Fri, 22 May 2020 16:23:59 +1200 Subject: [PATCH] Fix race condition that could corrupt pg_control. The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire ControlFileLock before modifying ControlFile->checkPointCopy, or the checkpointer could write out a control file with a bad checksum. Back-patch to all supported release. Author: Nathan Bossart <bossa...@amazon.com> Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com --- src/backend/access/transam/xlog.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ca09d81b08..274b808476 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9976,7 +9976,9 @@ xlog_redo(XLogReaderState *record) } /* ControlFile->checkPointCopy always tracks the latest ckpt XID */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); ControlFile->checkPointCopy.nextFullXid = checkPoint.nextFullXid; + LWLockRelease(ControlFileLock); /* Update shared-memory copy of checkpoint XID/epoch */ SpinLockAcquire(&XLogCtl->info_lck); @@ -10033,7 +10035,9 @@ xlog_redo(XLogReaderState *record) SetTransactionIdLimit(checkPoint.oldestXid, checkPoint.oldestXidDB); /* ControlFile->checkPointCopy always tracks the latest ckpt XID */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); ControlFile->checkPointCopy.nextFullXid = checkPoint.nextFullXid; + LWLockRelease(ControlFileLock); /* Update shared-memory copy of checkpoint XID/epoch */ SpinLockAcquire(&XLogCtl->info_lck); -- 2.20.1