Re: Online verification of checksums

Michael Banck Wed, 21 Nov 2018 04:39:02 -0800

Hi,

On Tue, Oct 30, 2018 at 06:22:52PM +0100, Fabien COELHO wrote:
> >I am not convinced we need to differentiate further between online and
> >offline operation, can you explain in more detail which other
> >differences are ok in online mode and why?
> 
> For instance the "file/directory was removed" do not look okay at all when
> offline, even if unlikely. Moreover, the checks hides the error message and
> is fully silent in this case, while it was not beforehand on the same error
> when offline.


OK, I kinda see the point here and added that.
 
> The "check if page was modified since checkpoint" does not look useful when
> offline. Maybe it lacks a comment to say that this cannot (should not ?)
> happen when offline, but even then I would not like it to be true: ISTM that
> no page should be allowed to be skipped on the checkpoint condition when
> offline, but it is probably ok to skip with the new page test, which make me
> still think that they should be counted and reported separately, or at least
> the checkpoint skip test should not be run when offline.

What is the rationale to not skip on the checkpoint condition when the
instance is offline?  If it was shutdown cleanly, this should not
happen, if the instance crashed, those would be spurious errors that
would get repaired on recovery. 

I have not changed that for now.

> When offline, the retry logic does not make much sense, it should complain
> directly on the first error? Also, I'm unsure of the read & checksum retry
> logic *without any delay*.

I think the small overhead of retrying in offline mode even if useless
is worth avoiding making the code more complicated in order to cater for
both modes.

Initially there was a delay, but this was removed after analysis and
requests by several other reviewers.

> >>This might suggest some option to tell the command that it should work in
> >>online or offline mode, so that it may be stricter in some cases. The
> >>default may be one of the option, eg the stricter offline mode, or maybe
> >>guessed at startup.
> >
> >If we believe the operation should be different, the patch removes the
> >"is cluster online?" check (as it is no longer necessary), so we could
> >just replace the current error message with a global variable with the
> >result of that check and use it where needed (if any).
> 
> That could let open the issue of someone starting the check offline, and
> then starting the database while it is not finished. Maybe it is not worth
> sweating about such a narrow use case.

I don't think we need to cater for that, yeah.
 
> If operations are to be different, and it seems to me they should be, I'd
> suggest (1) auto detect default based one the existing "is cluster online"
> code, (2) force options, eg --online vs --offline, which would complain and
> exit if the cluster is not in the right state on startup.

The current code bails out if it thinks the cluster is online. What is
wrong with just setting a flag now in case it is?
 
> I'd suggest to add a failing checksum online test, if possible. At least a
> "foo" file? 

Ok, done so.

> It would also be nice if the test could apply on an active database,
> eg with a low-rate pgbench running in parallel to the verification,
> but I'm not sure how easy it is to add such a thing.

That sounds much more complicated so I have not tackled that yet.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

diff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..4ad6edcde6 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -37,9 +37,8 @@ PostgreSQL documentation
   <title>Description</title>
   <para>
    <command>pg_verify_checksums</command> verifies data checksums in a
-   <productname>PostgreSQL</productname> cluster.  The server must be shut
-   down cleanly before running <application>pg_verify_checksums</application>.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   <productname>PostgreSQL</productname> cluster.  The exit status is zero if
+   there are no checksum errors, otherwise nonzero.  
   </para>
  </refsect1>
 
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index f0e09bea20..9519133e5e 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,7 +1,7 @@
 /*
  * pg_verify_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies page level checksums in a cluster
  *
  *	Copyright (c) 2010-2018, PostgreSQL Global Development Group
  *
@@ -26,10 +26,13 @@
 static int64 files = 0;
 static int64 blocks = 0;
 static int64 badblocks = 0;
+static int64 skippedblocks = 0;
 static ControlFileData *ControlFile;
+static XLogRecPtr checkpointLSN;
 
 static char *only_relfilenode = NULL;
 static bool verbose = false;
+static bool online = false;
 
 static const char *progname;
 
@@ -122,10 +125,17 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	bool		block_retry = false;
 
 	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
+		if (online && errno == ENOENT)
+		{
+			/* File was removed in the meantime */
+			return;
+		}
+
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
 				progname, fn, strerror(errno));
 		exit(1);
@@ -140,26 +150,106 @@ scan_file(const char *fn, BlockNumber segmentno)
 
 		if (r == 0)
 			break;
+		if (r < 0)
+		{
+			fprintf(stderr, _("%s: could not read block %u in file \"%s\": %s\n"),
+					progname, blockno, fn, strerror(errno));
+			return;
+		}
 		if (r != BLCKSZ)
 		{
-			fprintf(stderr, _("%s: could not read block %u in file \"%s\": read %d of %d\n"),
-					progname, blockno, fn, r, BLCKSZ);
-			exit(1);
+			if (block_retry)
+			{
+				/* We already tried once to reread the block, bail out */
+				fprintf(stderr, _("%s: could not read block %u in file \"%s\": read %d of %d\n"),
+						progname, blockno, fn, r, BLCKSZ);
+				exit(1);
+			}
+
+			/*
+			 * Retry the block. It's possible that we read the block while it
+			 * was extended or shrinked, so it it ends up looking torn to us.
+			 */
+
+			/*
+			 * Seek back by the amount of bytes we read to the beginning of
+			 * the failed block.
+			 */
+			if (lseek(f, -r, SEEK_CUR) == -1)
+			{
+				fprintf(stderr, _("%s: could not lseek in file \"%s\": %m\n"),
+						progname, fn);
+				exit(1);
+			}
+
+			/* Set flag so we know a retry was attempted */
+			block_retry = true;
+
+			/* Reset loop to validate the block again */
+			blockno--;
+
+			continue;
 		}
-		blocks++;
 
 		/* New pages have no checksum yet */
 		if (PageIsNew(header))
+		{
+			skippedblocks++;
 			continue;
+		}
+
+		blocks++;
 
 		csum = pg_checksum_page(buf.data, blockno + segmentno * RELSEG_SIZE);
 		if (csum != header->pd_checksum)
 		{
+			/*
+			 * Retry the block on the first failure.  If the verification is
+			 * done while the instance is online, it is possible that we read
+			 * the first 4K page of the block just before postgres updated the
+			 * entire block so it ends up looking torn to us.  We only need to
+			 * retry once because the LSN should be updated to something we can
+			 * ignore on the next pass.  If the error happens again then it is
+			 * a true validation failure.
+			 */
+			if (!block_retry)
+			{
+				/* Seek to the beginning of the failed block */
+				if (lseek(f, -BLCKSZ, SEEK_CUR) == -1)
+				{
+					fprintf(stderr, _("%s: could not lseek in file \"%s\": %m\n"),
+							progname, fn);
+					exit(1);
+				}
+
+				/* Set flag so we know a retry was attempted */
+				block_retry = true;
+
+				/* Reset loop to validate the block again */
+				blockno--;
+
+				continue;
+			}
+
+			/*
+			 * The checksum verification failed on retry as well.  Check if the
+			 * page has been modified since the checkpoint and skip it in this
+			 * case.
+			 */
+			if (PageGetLSN(buf.data) > checkpointLSN)
+			{
+				block_retry = false;
+				blocks--;
+				skippedblocks++;
+				continue;
+			}
+
 			if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION)
 				fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %u: calculated checksum %X but block contains %X\n"),
 						progname, fn, blockno, csum, header->pd_checksum);
 			badblocks++;
 		}
+		block_retry = false;
 	}
 
 	if (verbose)
@@ -195,6 +285,12 @@ scan_directory(const char *basedir, const char *subdir)
 		snprintf(fn, sizeof(fn), "%s/%s", path, de->d_name);
 		if (lstat(fn, &st) < 0)
 		{
+			if (online && errno == ENOENT)
+			{
+				/* File was removed in the meantime */
+				continue;
+			}
+
 			fprintf(stderr, _("%s: could not stat file \"%s\": %s\n"),
 					progname, fn, strerror(errno));
 			exit(1);
@@ -328,7 +424,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
-	/* Check if cluster is running */
+	/* Check if checksums are enabled */
 	ControlFile = get_controlfile(DataDir, progname, &crc_ok);
 	if (!crc_ok)
 	{
@@ -336,12 +432,10 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Check if cluster is running */
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
-	{
-		fprintf(stderr, _("%s: cluster must be shut down to verify checksums\n"), progname);
-		exit(1);
-	}
+		online = true;
 
 	if (ControlFile->data_checksum_version == 0)
 	{
@@ -349,6 +443,9 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	/* Get checkpoint LSN */
+	checkpointLSN = ControlFile->checkPoint;
+
 	/* Scan all files */
 	scan_directory(DataDir, "global");
 	scan_directory(DataDir, "base");
@@ -358,6 +455,8 @@ main(int argc, char *argv[])
 	printf(_("Data checksum version: %d\n"), ControlFile->data_checksum_version);
 	printf(_("Files scanned:  %s\n"), psprintf(INT64_FORMAT, files));
 	printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks));
+	if (skippedblocks > 0)
+		printf(_("Blocks skipped: %s\n"), psprintf(INT64_FORMAT, skippedblocks));
 	printf(_("Bad checksums:  %s\n"), psprintf(INT64_FORMAT, badblocks));
 
 	if (badblocks > 0)
diff --git a/src/bin/pg_verify_checksums/t/002_actions.pl b/src/bin/pg_verify_checksums/t/002_actions.pl
index 0e1725d9f2..bd22415f4a 100644
--- a/src/bin/pg_verify_checksums/t/002_actions.pl
+++ b/src/bin/pg_verify_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 36;
+use Test::More tests => 45;
 
 # Initialize node with checksums enabled.
 my $node = get_new_node('node_checksum');
@@ -48,10 +48,10 @@ append_to_file "$pgdata/global/99999_vm.123", "";
 command_ok(['pg_verify_checksums',  '-D', $pgdata],
 		   "succeeds with offline cluster");
 
-# Checks cannot happen with an online cluster
+# Checksums pass on an online cluster
 $node->start;
-command_fails(['pg_verify_checksums',  '-D', $pgdata],
-			  "fails with online cluster");
+command_ok(['pg_verify_checksums',  '-D', $pgdata],
+		   "succeeds with online cluster");
 
 # Create table to corrupt and get its relfilenode
 $node->safe_psql('postgres',
@@ -74,27 +74,6 @@ command_ok(['pg_verify_checksums',  '-D', $pgdata,
 	'-r', $relfilenode_corrupted],
 	"succeeds for single relfilenode with offline cluster");
 
-# Time to create some corruption
-open my $file, '+<', "$pgdata/$file_corrupted";
-seek($file, $pageheader_size, 0);
-syswrite($file, '\0\0\0\0\0\0\0\0\0');
-close $file;
-
-# Global checksum checks fail
-$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
-						  1,
-						  [qr/Bad checksums:.*1/],
-						  [qr/checksum verification failed/],
-						  'fails with corrupted data');
-
-# Checksum checks on single relfilenode fail
-$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
-							$relfilenode_corrupted],
-						  1,
-						  [qr/Bad checksums:.*1/],
-						  [qr/checksum verification failed/],
-						  'fails for corrupted data on single relfilenode');
-
 # Utility routine to check that pg_verify_checksums is able to detect
 # correctly-named relation files filled with some corrupted data.
 sub fail_corrupt
@@ -129,3 +108,46 @@ fail_corrupt($node, "99990_vm");
 fail_corrupt($node, "99990_init.123");
 fail_corrupt($node, "99990_fsm.123");
 fail_corrupt($node, "99990_vm.123");
+
+# Time to create some corruption
+open my $file, '+<', "$pgdata/$file_corrupted";
+seek($file, $pageheader_size, 0);
+syswrite($file, '\0\0\0\0\0\0\0\0\0');
+close $file;
+
+# Global checksum checks fail
+$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+						  1,
+						  [qr/Bad checksums:.*1/],
+						  [qr/checksum verification failed/],
+						  'fails with corrupted data');
+
+# Checksum checks on single relfilenode fail
+$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
+							$relfilenode_corrupted],
+						  1,
+						  [qr/Bad checksums:.*1/],
+						  [qr/checksum verification failed/],
+						  'fails for corrupted data on single relfilenode');
+
+# Start node again to test online verification correctly finds failures
+$node->stop;
+
+# Global online checksum checks fail
+$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata],
+						  1,
+						  [qr/Bad checksums:.*1/],
+						  [qr/checksum verification failed/],
+						  'fails with corrupted data');
+
+# Online checksum checks on single relfilenode fail
+$node->command_checks_all([ 'pg_verify_checksums', '-D', $pgdata, '-r',
+							$relfilenode_corrupted],
+						  1,
+						  [qr/Bad checksums:.*1/],
+						  [qr/checksum verification failed/],
+						  'fails for corrupted data on single relfilenode');
+
+# Authorized relation files filled with corrupted data cause the
+# online checksum checks to fail.
+fail_corrupt($node, "99991");

Re: Online verification of checksums

Reply via email to