On 25/11/2020 15:20, Daniel Gustafsson wrote:
On 23 Nov 2020, at 18:36, Heikki Linnakangas <hlinn...@iki.fi> wrote:
What happens is if you crash between UpdateControlFile() and XlogChecksum()?
Good point, that would not get the cluster to a consistent state. The
XlogChecksum should be performed before controlfile is udpated.
+void
+SetDataChecksumsOff(void)
+{
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+ if (ControlFile->data_checksum_version == 0)
+ {
+ LWLockRelease(ControlFileLock);
+ return;
+ }
+
+ if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
+ {
+ ControlFile->data_checksum_version =
PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION;
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+
+ /*
+ * Update local state in all backends to ensure that any
backend in
+ * "on" state is changed to "inprogress-off".
+ */
+ XlogChecksums(PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
+
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_INPROGRESS_OFF));
+
+ /*
+ * At this point we know that no backends are verifying data
checksums
+ * during reading. Next, we can safely move to state "off" to
also
+ * stop writing checksums.
+ */
+ }
+
+ XlogChecksums(0);
+
+ LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+ ControlFile->data_checksum_version = 0;
+ UpdateControlFile();
+ LWLockRelease(ControlFileLock);
+
+
WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF));
+}
The lwlocking doesn't look right here. If
ControlFile->data_checksum_version != PG_DATA_CHECKSUM_VERSION,
LWLockAcquire is called twice without a LWLockRelease in between.
What if a checkpoint, and a crash, happens just after the WAL record has
been written, but before the control file is updated? That's a
ridiculously tight window for a whole checkpoint cycle to happen, but in
principle I think that would spell trouble. I think you could set
delayChkpt to prevent the checkpoint from happening in that window,
similar to how we avoid this problem with the clog updates at commit.
Also, I think this should be in a critical section; we don't want the
process to error out in between for any reason, and if it does happen,
it's panic time.
- Heikki