On Thu, Oct 22, 2020 at 10:41:53AM +0900, Michael Paquier wrote: > We cannot trust the fields fields of the page header because these may > have been messed up with some random corruption, so what really > matters is that the checksums don't match, and that we can just rely > on that. The zero-only case of a page is different because these > don't have a checksum set, so I would finish with something like the > attached to make the detection more robust. This does not make the > detection perfect as there is no locking insurance (we really need to > do that but v13 has been released already), but with a sufficient > number of retries this can make things much more reliable than what's > present. > > Are there any comments? Anybody?
So, hearing nothing, attached is a set of patches that I would like to apply to 11~ to address the set of issues of this thread. This comes with two parts: - Some refactoring of PageIsVerified(), similar to d401c57 on HEAD except that this keeps ABI compatibility. - The actual patch, with tweaks for each stable branch. Playing with dd and generating random pages, this detects random corruptions, making use of a wait/retry loop if a failure is detected. As mentioned upthread, this is a double-edged sword, increasing the number of retries reduces the changes of false positives, at the cost of making regression tests longer. This stuff uses up to 5 retries with 100ms of sleep for each page. (I am aware of the fact that the commit message of the main patch is not written yet). -- Michael
From 5413343de74a77f0257619d8599523dc7f0a01be Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 30 Oct 2020 10:18:34 +0900 Subject: [PATCH v8] Fix page verifications in base backups --- src/backend/replication/basebackup.c | 148 ++++++++++--------- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 20 ++- 2 files changed, 96 insertions(+), 72 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index b89df01fa7..f8d0fc041d 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1569,21 +1569,24 @@ sendFile(const char *readfilename, const char *tarfilename, { int fd; BlockNumber blkno = 0; - bool block_retry = false; + int block_attempts = 0; char buf[TAR_SEND_SIZE]; - uint16 checksum; int checksum_failures = 0; off_t cnt; int i; pgoff_t len = 0; char *page; size_t pad; - PageHeader phdr; int segmentno = 0; char *segmentpath; bool verify_checksum = false; pg_checksum_context checksum_ctx; + /* Maximum number of checksum failures reported for this file */ +#define CHECKSUM_REPORT_THRESHOLD 5 + /* maximum number of retries done for a single page */ +#define PAGE_RETRY_THRESHOLD 5 + pg_checksum_init(&checksum_ctx, manifest->checksum_type); fd = OpenTransientFile(readfilename, O_RDONLY | PG_BINARY); @@ -1661,9 +1664,7 @@ sendFile(const char *readfilename, const char *tarfilename, if (verify_checksum && (cnt % BLCKSZ != 0)) { ereport(WARNING, - (errmsg("could not verify checksum in file \"%s\", block " - "%d: read buffer size %d and page size %d " - "differ", + (errmsg("could not verify checksum in file \"%s\", block %u: read buffer size %d and page size %d differ", readfilename, blkno, (int) cnt, BLCKSZ))); verify_checksum = false; } @@ -1674,79 +1675,84 @@ sendFile(const char *readfilename, const char *tarfilename, { page = buf + BLCKSZ * i; + elog(DEBUG1, "checking block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + /* - * Only check pages which have not been modified since the - * start of the base backup. Otherwise, they might have been - * written only halfway and the checksum would not be valid. - * However, replaying WAL would reinstate the correct page in - * this case. We also skip completely new pages, since they - * don't have a checksum yet. + * Check on-disk pages with the same set of verification + * conditions used before loading pages into shared buffers. + * Note that PageIsVerifiedExtended() considers pages filled + * only with zeros as valid, since they don't have a checksum + * yet. Failures are not reported to pgstat yet, as these are + * reported in a single batch once a file is completed. It + * could be possible, that a page is written halfway while + * doing this check, causing a false positive. If that + * happens, a page is retried multiple times, with an error + * reported if the second attempt also fails. */ - if (!PageIsNew(page) && PageGetLSN(page) < startptr) + if (PageIsVerifiedExtended(page, blkno, 0)) { - checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE); - phdr = (PageHeader) page; - if (phdr->pd_checksum != checksum) + /* success, move on to the next block */ + blkno++; + elog(DEBUG1, "check success for block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + block_attempts = 0; + continue; + } + + /* The verification of a page has failed, retry again */ + if (block_attempts < PAGE_RETRY_THRESHOLD) + { + int reread_cnt; + + /* Reread the failed block */ + reread_cnt = + basebackup_read_file(fd, buf + BLCKSZ * i, + BLCKSZ, len + BLCKSZ * i, + readfilename, + false); + if (reread_cnt == 0) { /* - * Retry the block on the first failure. It's - * possible that we read the first 4K page of the - * block just before postgres updated the entire block - * so it ends up looking torn to us. We only need to - * retry once because the LSN should be updated to - * something we can ignore on the next pass. If the - * error happens again then it is a true validation - * failure. + * If we hit end-of-file, a concurrent truncation must + * have occurred, so break out of this loop just as if + * the initial fread() returned 0. We'll drop through + * to the same code that handles that case. (We must + * fix up cnt first, though.) */ - if (block_retry == false) - { - int reread_cnt; - - /* Reread the failed block */ - reread_cnt = - basebackup_read_file(fd, buf + BLCKSZ * i, - BLCKSZ, len + BLCKSZ * i, - readfilename, - false); - if (reread_cnt == 0) - { - /* - * If we hit end-of-file, a concurrent - * truncation must have occurred, so break out - * of this loop just as if the initial fread() - * returned 0. We'll drop through to the same - * code that handles that case. (We must fix - * up cnt first, though.) - */ - cnt = BLCKSZ * i; - break; - } - - /* Set flag so we know a retry was attempted */ - block_retry = true; - - /* Reset loop to validate the block again */ - i--; - continue; - } - - checksum_failures++; - - if (checksum_failures <= 5) - ereport(WARNING, - (errmsg("checksum verification failed in " - "file \"%s\", block %d: calculated " - "%X but expected %X", - readfilename, blkno, checksum, - phdr->pd_checksum))); - if (checksum_failures == 5) - ereport(WARNING, - (errmsg("further checksum verification " - "failures in file \"%s\" will not " - "be reported", readfilename))); + cnt = BLCKSZ * i; + break; } + + /* Set the counter so we know a retry was attempted */ + block_attempts++; + + elog(DEBUG1, "check failure for block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + + /* + * Wait for 100 ms to give some room to any parallel page + * write that may have caused this retry to finish. + */ + pg_usleep(100000L); + + /* Reset loop to validate the block again */ + i--; + continue; } - block_retry = false; + + checksum_failures++; + if (checksum_failures <= CHECKSUM_REPORT_THRESHOLD) + ereport(WARNING, + (errmsg("checksum verification failed in block %u of file \"%s\"", + blkno, readfilename))); + if (checksum_failures == CHECKSUM_REPORT_THRESHOLD) + ereport(WARNING, + (errmsg("further checksum verification failures in file \"%s\" will not be reported", + readfilename))); + + /* move to next block */ + block_attempts = 0; blkno++; } } diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index f674a7c94e..a46a9ece42 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -6,7 +6,7 @@ use File::Basename qw(basename dirname); use File::Path qw(rmtree); use PostgresNode; use TestLib; -use Test::More tests => 109; +use Test::More tests => 112; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -528,6 +528,24 @@ $node->command_checks_all( 'pg_basebackup reports checksum mismatch'); rmtree("$tempdir/backup_corrupt"); +# Create a new type of corruption and zero out one page header +# completely. +open $file, '+<', "$pgdata/$file_corrupt1"; +seek($file, $pageheader_size, 0); +system_or_bail 'pg_ctl', '-D', $pgdata, 'stop'; +my $zero_data = '\0' x $pageheader_size; +syswrite($file, $zero_data); +close $file; +system_or_bail 'pg_ctl', '-D', $pgdata, 'start'; + +$node->command_checks_all( + [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt1a" ], + 1, + [qr{^$}], + [qr/^WARNING.*checksum verification failed/s], + "pg_basebackup reports checksum mismatch for zeroed page headers"); +rmtree("$tempdir/backup_corrupt1a"); + # induce further corruption in 5 more blocks system_or_bail 'pg_ctl', '-D', $pgdata, 'stop'; open $file, '+<', "$pgdata/$file_corrupt1"; -- 2.29.1
From 0bd03bfd59e9116158b07e660d70fff5cf1fb65e Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 30 Oct 2020 10:56:02 +0900 Subject: [PATCH v8 1/2] Extend PageIsVerified() to handle more custom options This is useful for checks of relation pages without having to load the pages into the shared buffers, and two cases can make use of that: page verification in base backups and the online, lock-safe, flavor. Compatibility is kept with past versions using a routine that calls the new extended routine with the set of options compatible with the original version. Contrary to d401c576, a macro cannot be used as there may be external code relying on the presence of the original routine. This is applied down to 11, where this will be used by a follow-up commit addressing a set of issues with page verification in base backups. Extracted from a larger patch by the same author. Author: Anastasia Lubennikova Reviewed-by: Michael Paquier, Julien Rouhaud Discussion: https://postgr.es/m/608f3476-0598-2514-2c03-e05c7d2b0...@postgrespro.ru Backpatch-through: 11 --- src/include/storage/bufpage.h | 17 ++++++++++----- src/backend/catalog/storage.c | 3 ++- src/backend/storage/buffer/bufmgr.c | 6 ++++-- src/backend/storage/page/bufpage.c | 32 +++++++++++++++++++++++------ 4 files changed, 44 insertions(+), 14 deletions(-) diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h index 3f88683a05..4d0ae4b36b 100644 --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -410,26 +410,33 @@ do { \ * extern declarations * ---------------------------------------------------------------- */ + +/* flags for PageAddItemExtended() */ #define PAI_OVERWRITE (1 << 0) #define PAI_IS_HEAP (1 << 1) +/* flags for PageIsVerifiedExtended() */ +#define PIV_LOG_WARNING (1 << 0) +#define PIV_REPORT_STAT (1 << 1) + #define PageAddItem(page, item, size, offsetNumber, overwrite, is_heap) \ PageAddItemExtended(page, item, size, offsetNumber, \ ((overwrite) ? PAI_OVERWRITE : 0) | \ ((is_heap) ? PAI_IS_HEAP : 0)) /* - * Check that BLCKSZ is a multiple of sizeof(size_t). In PageIsVerified(), - * it is much faster to check if a page is full of zeroes using the native - * word size. Note that this assertion is kept within a header to make - * sure that StaticAssertDecl() works across various combinations of - * platforms and compilers. + * Check that BLCKSZ is a multiple of sizeof(size_t). In + * PageIsVerifiedExtended(), it is much faster to check if a page is + * full of zeroes using the native word size. Note that this assertion + * is kept within a header to make sure that StaticAssertDecl() works + * across various combinations of platforms and compilers. */ StaticAssertDecl(BLCKSZ == ((BLCKSZ / sizeof(size_t)) * sizeof(size_t)), "BLCKSZ has to be a multiple of sizeof(size_t)"); extern void PageInit(Page page, Size pageSize, Size specialSize); extern bool PageIsVerified(Page page, BlockNumber blkno); +extern bool PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags); extern OffsetNumber PageAddItemExtended(Page page, Item item, Size size, OffsetNumber offsetNumber, int flags); extern Page PageGetTempPage(Page page); diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index ec143b640a..74216785b7 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -443,7 +443,8 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst, smgrread(src, forkNum, blkno, buf.data); - if (!PageIsVerified(page, blkno)) + if (!PageIsVerifiedExtended(page, blkno, + PIV_LOG_WARNING | PIV_REPORT_STAT)) ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("invalid page in block %u of relation %s", diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 29c920800a..9381f9981d 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -624,7 +624,8 @@ ReadBuffer(Relation reln, BlockNumber blockNum) * * In RBM_NORMAL mode, the page is read from disk, and the page header is * validated. An error is thrown if the page header is not valid. (But - * note that an all-zero page is considered "valid"; see PageIsVerified().) + * note that an all-zero page is considered "valid"; see + * PageIsVerifiedExtended().) * * RBM_ZERO_ON_ERROR is like the normal mode, but if the page header is not * valid, the page is zeroed instead of throwing an error. This is intended @@ -916,7 +917,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, } /* check for garbage data */ - if (!PageIsVerified((Page) bufBlock, blockNum)) + if (!PageIsVerifiedExtended((Page) bufBlock, blockNum, + PIV_LOG_WARNING | PIV_REPORT_STAT)) { if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages) { diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index d708117a40..8dc67c97cb 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -62,6 +62,18 @@ PageInit(Page page, Size pageSize, Size specialSize) /* * PageIsVerified + * Utility wrapper for PageIsVerifiedExtended(). + */ +bool +PageIsVerified(Page page, BlockNumber blkno) +{ + return PageIsVerifiedExtended(page, blkno, + PIV_LOG_WARNING | PIV_REPORT_STAT); +} + + +/* + * PageIsVerifiedExtended * Check that the page header and checksum (if any) appear valid. * * This is called when a page has just been read in from disk. The idea is @@ -77,9 +89,15 @@ PageInit(Page page, Size pageSize, Size specialSize) * allow zeroed pages here, and are careful that the page access macros * treat such a page as empty and without free space. Eventually, VACUUM * will clean up such a page and make it usable. + * + * If flag PIV_LOG_WARNING is set, a WARNING is logged in the event of + * a checksum failure. + * + * If flag PIV_REPORT_STAT is set, a checksum failure is reported directly + * to pgstat. */ bool -PageIsVerified(Page page, BlockNumber blkno) +PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) { PageHeader p = (PageHeader) page; size_t *pagebytes; @@ -140,12 +158,14 @@ PageIsVerified(Page page, BlockNumber blkno) */ if (checksum_failure) { - ereport(WARNING, - (errcode(ERRCODE_DATA_CORRUPTED), - errmsg("page verification failed, calculated checksum %u but expected %u", - checksum, p->pd_checksum))); + if ((flags & PIV_LOG_WARNING) != 0) + ereport(WARNING, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("page verification failed, calculated checksum %u but expected %u", + checksum, p->pd_checksum))); - pgstat_report_checksum_failure(); + if ((flags & PIV_REPORT_STAT) != 0) + pgstat_report_checksum_failure(); if (header_sane && ignore_checksum_failure) return true; -- 2.29.1
From fc850bada4a93523a69eb9b824432209d6eee344 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 30 Oct 2020 10:19:14 +0900 Subject: [PATCH v8 2/2] Fix page verification in base backups --- src/backend/replication/basebackup.c | 177 ++++++++++--------- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 20 ++- 2 files changed, 111 insertions(+), 86 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 50ae1f16d0..9a9acbe818 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1566,21 +1566,24 @@ sendFile(const char *readfilename, const char *tarfilename, { FILE *fp; BlockNumber blkno = 0; - bool block_retry = false; + int block_attempts = 0; char buf[TAR_SEND_SIZE]; - uint16 checksum; int checksum_failures = 0; off_t cnt; int i; pgoff_t len = 0; char *page; size_t pad; - PageHeader phdr; int segmentno = 0; char *segmentpath; bool verify_checksum = false; pg_checksum_context checksum_ctx; + /* Maximum number of checksum failures reported for this file */ +#define CHECKSUM_REPORT_THRESHOLD 5 + /* maximum number of retries done for a single page */ +#define PAGE_RETRY_THRESHOLD 5 + pg_checksum_init(&checksum_ctx, manifest->checksum_type); fp = AllocateFile(readfilename, "rb"); @@ -1639,9 +1642,7 @@ sendFile(const char *readfilename, const char *tarfilename, if (verify_checksum && (cnt % BLCKSZ != 0)) { ereport(WARNING, - (errmsg("could not verify checksum in file \"%s\", block " - "%d: read buffer size %d and page size %d " - "differ", + (errmsg("could not verify checksum in file \"%s\", block %u: read buffer size %d and page size %d differ", readfilename, blkno, (int) cnt, BLCKSZ))); verify_checksum = false; } @@ -1652,96 +1653,102 @@ sendFile(const char *readfilename, const char *tarfilename, { page = buf + BLCKSZ * i; + elog(DEBUG1, "checking block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + /* - * Only check pages which have not been modified since the - * start of the base backup. Otherwise, they might have been - * written only halfway and the checksum would not be valid. - * However, replaying WAL would reinstate the correct page in - * this case. We also skip completely new pages, since they - * don't have a checksum yet. + * Check on-disk pages with the same set of verification + * conditions used before loading pages into shared buffers. + * Note that PageIsVerifiedExtended() considers pages filled + * only with zeros as valid, since they don't have a checksum + * yet. Failures are not reported to pgstat yet, as these are + * reported in a single batch once a file is completed. It + * could be possible, that a page is written halfway while + * doing this check, causing a false positive. If that + * happens, a page is retried multiple times, with an error + * reported if the second attempt also fails. */ - if (!PageIsNew(page) && PageGetLSN(page) < startptr) + if (PageIsVerifiedExtended(page, blkno, 0)) { - checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE); - phdr = (PageHeader) page; - if (phdr->pd_checksum != checksum) + /* success, move on to the next block */ + blkno++; + elog(DEBUG1, "check success for block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + block_attempts = 0; + continue; + } + + /* The verification of a page has failed, retry again */ + if (block_attempts < PAGE_RETRY_THRESHOLD) + { + /* Reread the failed block */ + if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1) + { + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not fseek in file \"%s\": %m", + readfilename))); + } + + if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ) { /* - * Retry the block on the first failure. It's - * possible that we read the first 4K page of the - * block just before postgres updated the entire block - * so it ends up looking torn to us. We only need to - * retry once because the LSN should be updated to - * something we can ignore on the next pass. If the - * error happens again then it is a true validation - * failure. + * If we hit end-of-file, a concurrent truncation must + * have occurred, so break out of this loop just as if + * the initial fread() returned 0. We'll drop through + * to the same code that handles that case. (We must + * fix up cnt first, though.) */ - if (block_retry == false) + if (feof(fp)) { - /* Reread the failed block */ - if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1) - { - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not fseek in file \"%s\": %m", - readfilename))); - } - - if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ) - { - /* - * If we hit end-of-file, a concurrent - * truncation must have occurred, so break out - * of this loop just as if the initial fread() - * returned 0. We'll drop through to the same - * code that handles that case. (We must fix - * up cnt first, though.) - */ - if (feof(fp)) - { - cnt = BLCKSZ * i; - break; - } - - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not reread block %d of file \"%s\": %m", - blkno, readfilename))); - } - - if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1) - { - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not fseek in file \"%s\": %m", - readfilename))); - } - - /* Set flag so we know a retry was attempted */ - block_retry = true; - - /* Reset loop to validate the block again */ - i--; - continue; + cnt = BLCKSZ * i; + break; } - checksum_failures++; - - if (checksum_failures <= 5) - ereport(WARNING, - (errmsg("checksum verification failed in " - "file \"%s\", block %d: calculated " - "%X but expected %X", - readfilename, blkno, checksum, - phdr->pd_checksum))); - if (checksum_failures == 5) - ereport(WARNING, - (errmsg("further checksum verification " - "failures in file \"%s\" will not " - "be reported", readfilename))); + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not reread block %d of file \"%s\": %m", + blkno, readfilename))); } + + + if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1) + { + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not fseek in file \"%s\": %m", + readfilename))); + } + + /* Set the counter so we know a retry was attempted */ + block_attempts++; + + elog(DEBUG1, "check failure for block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + + /* + * Wait for 100 ms to give some room to any parallel page + * write that may have caused this retry to finish. + */ + pg_usleep(100000L); + + /* Reset loop to validate the block again */ + i--; + continue; } - block_retry = false; + + checksum_failures++; + if (checksum_failures <= CHECKSUM_REPORT_THRESHOLD) + ereport(WARNING, + (errmsg("checksum verification failed in block %u of file \"%s\"", + blkno, readfilename))); + if (checksum_failures == CHECKSUM_REPORT_THRESHOLD) + ereport(WARNING, + (errmsg("further checksum verification failures in file \"%s\" will not be reported", + readfilename))); + + /* move to next block */ + block_attempts = 0; blkno++; } } diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 208df557b8..01cc2dceb2 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -6,7 +6,7 @@ use File::Basename qw(basename dirname); use File::Path qw(rmtree); use PostgresNode; use TestLib; -use Test::More tests => 109; +use Test::More tests => 112; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -514,6 +514,24 @@ $node->command_checks_all( 'pg_basebackup reports checksum mismatch'); rmtree("$tempdir/backup_corrupt"); +# Create a new type of corruption and zero out one page header +# completely. +open $file, '+<', "$pgdata/$file_corrupt1"; +seek($file, $pageheader_size, 0); +system_or_bail 'pg_ctl', '-D', $pgdata, 'stop'; +my $zero_data = '\0' x $pageheader_size; +syswrite($file, $zero_data); +close $file; +system_or_bail 'pg_ctl', '-D', $pgdata, 'start'; + +$node->command_checks_all( + [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt1a" ], + 1, + [qr{^$}], + [qr/^WARNING.*checksum verification failed/s], + "pg_basebackup reports checksum mismatch for zeroed page headers"); +rmtree("$tempdir/backup_corrupt1a"); + # induce further corruption in 5 more blocks system_or_bail 'pg_ctl', '-D', $pgdata, 'stop'; open $file, '+<', "$pgdata/$file_corrupt1"; -- 2.29.1
From bc513ae4ec8ce33cd75bbca1ed801de7ae8b4697 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 30 Oct 2020 10:59:23 +0900 Subject: [PATCH v8 1/2] Extend PageIsVerified() to handle more custom options This is useful for checks of relation pages without having to load the pages into the shared buffers, and two cases can make use of that: page verification in base backups and the online, lock-safe, flavor. Compatibility is kept with past versions using a routine that calls the new extended routine with the set of options compatible with the original version. Contrary to d401c576, a macro cannot be used as there may be external code relying on the presence of the original routine. This is applied down to 11, where this will be used by a follow-up commit addressing a set of issues with page verification in base backups. Extracted from a larger patch by the same author. Author: Anastasia Lubennikova Reviewed-by: Michael Paquier, Julien Rouhaud Discussion: https://postgr.es/m/608f3476-0598-2514-2c03-e05c7d2b0...@postgrespro.ru Backpatch-through: 11 --- src/include/storage/bufpage.h | 7 +++++++ src/backend/catalog/storage.c | 3 ++- src/backend/storage/buffer/bufmgr.c | 6 ++++-- src/backend/storage/page/bufpage.c | 32 +++++++++++++++++++++++------ 4 files changed, 39 insertions(+), 9 deletions(-) diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h index 34b68ad0e0..56ca6b6ce1 100644 --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -410,9 +410,15 @@ do { \ * extern declarations * ---------------------------------------------------------------- */ + +/* flags for PageAddItemExtended() */ #define PAI_OVERWRITE (1 << 0) #define PAI_IS_HEAP (1 << 1) +/* flags for PageIsVerifiedExtended() */ +#define PIV_LOG_WARNING (1 << 0) +#define PIV_REPORT_STAT (1 << 1) + #define PageAddItem(page, item, size, offsetNumber, overwrite, is_heap) \ PageAddItemExtended(page, item, size, offsetNumber, \ ((overwrite) ? PAI_OVERWRITE : 0) | \ @@ -420,6 +426,7 @@ do { \ extern void PageInit(Page page, Size pageSize, Size specialSize); extern bool PageIsVerified(Page page, BlockNumber blkno); +extern bool PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags); extern OffsetNumber PageAddItemExtended(Page page, Item item, Size size, OffsetNumber offsetNumber, int flags); extern Page PageGetTempPage(Page page); diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 3cc886f7fe..f899b25c0e 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -338,7 +338,8 @@ RelationCopyStorage(SMgrRelation src, SMgrRelation dst, smgrread(src, forkNum, blkno, buf.data); - if (!PageIsVerified(page, blkno)) + if (!PageIsVerifiedExtended(page, blkno, + PIV_LOG_WARNING | PIV_REPORT_STAT)) ereport(ERROR, (errcode(ERRCODE_DATA_CORRUPTED), errmsg("invalid page in block %u of relation %s", diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 7332e6b590..4ceb2ce25b 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -613,7 +613,8 @@ ReadBuffer(Relation reln, BlockNumber blockNum) * * In RBM_NORMAL mode, the page is read from disk, and the page header is * validated. An error is thrown if the page header is not valid. (But - * note that an all-zero page is considered "valid"; see PageIsVerified().) + * note that an all-zero page is considered "valid"; see + * PageIsVerifiedExtended().) * * RBM_ZERO_ON_ERROR is like the normal mode, but if the page header is not * valid, the page is zeroed instead of throwing an error. This is intended @@ -905,7 +906,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, } /* check for garbage data */ - if (!PageIsVerified((Page) bufBlock, blockNum)) + if (!PageIsVerifiedExtended((Page) bufBlock, blockNum, + PIV_LOG_WARNING | PIV_REPORT_STAT)) { if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages) { diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index 169430835c..7500b9d0b5 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -62,6 +62,18 @@ PageInit(Page page, Size pageSize, Size specialSize) /* * PageIsVerified + * Utility wrapper for PageIsVerifiedExtended(). + */ +bool +PageIsVerified(Page page, BlockNumber blkno) +{ + return PageIsVerifiedExtended(page, blkno, + PIV_LOG_WARNING | PIV_REPORT_STAT); +} + + +/* + * PageIsVerifiedExtended * Check that the page header and checksum (if any) appear valid. * * This is called when a page has just been read in from disk. The idea is @@ -77,9 +89,15 @@ PageInit(Page page, Size pageSize, Size specialSize) * allow zeroed pages here, and are careful that the page access macros * treat such a page as empty and without free space. Eventually, VACUUM * will clean up such a page and make it usable. + * + * If flag PIV_LOG_WARNING is set, a WARNING is logged in the event of + * a checksum failure. + * + * If flag PIV_REPORT_STAT is set, a checksum failure is reported directly + * to pgstat. */ bool -PageIsVerified(Page page, BlockNumber blkno) +PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) { PageHeader p = (PageHeader) page; size_t *pagebytes; @@ -147,12 +165,14 @@ PageIsVerified(Page page, BlockNumber blkno) */ if (checksum_failure) { - ereport(WARNING, - (errcode(ERRCODE_DATA_CORRUPTED), - errmsg("page verification failed, calculated checksum %u but expected %u", - checksum, p->pd_checksum))); + if ((flags & PIV_LOG_WARNING) != 0) + ereport(WARNING, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("page verification failed, calculated checksum %u but expected %u", + checksum, p->pd_checksum))); - pgstat_report_checksum_failure(); + if ((flags & PIV_REPORT_STAT) != 0) + pgstat_report_checksum_failure(); if (header_sane && ignore_checksum_failure) return true; -- 2.29.1
From cb6b0414daefe89ad2252d5c479a138a61ea3b63 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 30 Oct 2020 10:19:14 +0900 Subject: [PATCH v8 2/2] Fix page verification in base backups --- src/backend/replication/basebackup.c | 177 ++++++++++--------- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 20 ++- 2 files changed, 111 insertions(+), 86 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index efc458d80b..ab8e1c64cc 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1397,20 +1397,23 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf { FILE *fp; BlockNumber blkno = 0; - bool block_retry = false; + int block_attempts = 0; char buf[TAR_SEND_SIZE]; - uint16 checksum; int checksum_failures = 0; off_t cnt; int i; pgoff_t len = 0; char *page; size_t pad; - PageHeader phdr; int segmentno = 0; char *segmentpath; bool verify_checksum = false; + /* Maximum number of checksum failures reported for this file */ +#define CHECKSUM_REPORT_THRESHOLD 5 + /* maximum number of retries done for a single page */ +#define PAGE_RETRY_THRESHOLD 5 + fp = AllocateFile(readfilename, "rb"); if (fp == NULL) { @@ -1467,9 +1470,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf if (verify_checksum && (cnt % BLCKSZ != 0)) { ereport(WARNING, - (errmsg("could not verify checksum in file \"%s\", block " - "%d: read buffer size %d and page size %d " - "differ", + (errmsg("could not verify checksum in file \"%s\", block %u: read buffer size %d and page size %d differ", readfilename, blkno, (int) cnt, BLCKSZ))); verify_checksum = false; } @@ -1480,96 +1481,102 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf { page = buf + BLCKSZ * i; + elog(DEBUG1, "checking block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + /* - * Only check pages which have not been modified since the - * start of the base backup. Otherwise, they might have been - * written only halfway and the checksum would not be valid. - * However, replaying WAL would reinstate the correct page in - * this case. We also skip completely new pages, since they - * don't have a checksum yet. + * Check on-disk pages with the same set of verification + * conditions used before loading pages into shared buffers. + * Note that PageIsVerifiedExtended() considers pages filled + * only with zeros as valid, since they don't have a checksum + * yet. Failures are not reported to pgstat yet, as these are + * reported in a single batch once a file is completed. It + * could be possible, that a page is written halfway while + * doing this check, causing a false positive. If that + * happens, a page is retried multiple times, with an error + * reported if the second attempt also fails. */ - if (!PageIsNew(page) && PageGetLSN(page) < startptr) + if (PageIsVerifiedExtended(page, blkno, 0)) { - checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE); - phdr = (PageHeader) page; - if (phdr->pd_checksum != checksum) + /* success, move on to the next block */ + blkno++; + elog(DEBUG1, "check success for block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + block_attempts = 0; + continue; + } + + /* The verification of a page has failed, retry again */ + if (block_attempts < PAGE_RETRY_THRESHOLD) + { + /* Reread the failed block */ + if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1) + { + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not fseek in file \"%s\": %m", + readfilename))); + } + + if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ) { /* - * Retry the block on the first failure. It's - * possible that we read the first 4K page of the - * block just before postgres updated the entire block - * so it ends up looking torn to us. We only need to - * retry once because the LSN should be updated to - * something we can ignore on the next pass. If the - * error happens again then it is a true validation - * failure. + * If we hit end-of-file, a concurrent truncation must + * have occurred, so break out of this loop just as if + * the initial fread() returned 0. We'll drop through + * to the same code that handles that case. (We must + * fix up cnt first, though.) */ - if (block_retry == false) + if (feof(fp)) { - /* Reread the failed block */ - if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1) - { - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not fseek in file \"%s\": %m", - readfilename))); - } - - if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ) - { - /* - * If we hit end-of-file, a concurrent - * truncation must have occurred, so break out - * of this loop just as if the initial fread() - * returned 0. We'll drop through to the same - * code that handles that case. (We must fix - * up cnt first, though.) - */ - if (feof(fp)) - { - cnt = BLCKSZ * i; - break; - } - - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not reread block %d of file \"%s\": %m", - blkno, readfilename))); - } - - if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1) - { - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not fseek in file \"%s\": %m", - readfilename))); - } - - /* Set flag so we know a retry was attempted */ - block_retry = true; - - /* Reset loop to validate the block again */ - i--; - continue; + cnt = BLCKSZ * i; + break; } - checksum_failures++; - - if (checksum_failures <= 5) - ereport(WARNING, - (errmsg("checksum verification failed in " - "file \"%s\", block %d: calculated " - "%X but expected %X", - readfilename, blkno, checksum, - phdr->pd_checksum))); - if (checksum_failures == 5) - ereport(WARNING, - (errmsg("further checksum verification " - "failures in file \"%s\" will not " - "be reported", readfilename))); + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not reread block %d of file \"%s\": %m", + blkno, readfilename))); } + + + if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1) + { + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not fseek in file \"%s\": %m", + readfilename))); + } + + /* Set the counter so we know a retry was attempted */ + block_attempts++; + + elog(DEBUG1, "check failure for block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + + /* + * Wait for 100 ms to give some room to any parallel page + * write that may have caused this retry to finish. + */ + pg_usleep(100000L); + + /* Reset loop to validate the block again */ + i--; + continue; } - block_retry = false; + + checksum_failures++; + if (checksum_failures <= CHECKSUM_REPORT_THRESHOLD) + ereport(WARNING, + (errmsg("checksum verification failed in block %u of file \"%s\"", + blkno, readfilename))); + if (checksum_failures == CHECKSUM_REPORT_THRESHOLD) + ereport(WARNING, + (errmsg("further checksum verification failures in file \"%s\" will not be reported", + readfilename))); + + /* move to next block */ + block_attempts = 0; blkno++; } } diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 3c70499feb..2581d33631 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -6,7 +6,7 @@ use File::Basename qw(basename dirname); use File::Path qw(rmtree); use PostgresNode; use TestLib; -use Test::More tests => 107; +use Test::More tests => 110; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -511,6 +511,24 @@ $node->command_checks_all( 'pg_basebackup reports checksum mismatch'); rmtree("$tempdir/backup_corrupt"); +# Create a new type of corruption and zero out one page header +# completely. +open $file, '+<', "$pgdata/$file_corrupt1"; +seek($file, $pageheader_size, 0); +system_or_bail 'pg_ctl', '-D', $pgdata, 'stop'; +my $zero_data = '\0' x $pageheader_size; +syswrite($file, $zero_data); +close $file; +system_or_bail 'pg_ctl', '-D', $pgdata, 'start'; + +$node->command_checks_all( + [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt1a" ], + 1, + [qr{^$}], + [qr/^WARNING.*checksum verification failed/s], + "pg_basebackup reports checksum mismatch for zeroed page headers"); +rmtree("$tempdir/backup_corrupt1a"); + # induce further corruption in 5 more blocks system_or_bail 'pg_ctl', '-D', $pgdata, 'stop'; open $file, '+<', "$pgdata/$file_corrupt1"; -- 2.29.1
From 74b32f4e942847d87b3acb856a0d20fb2b0148bd Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Thu, 29 Oct 2020 21:58:41 +0900 Subject: [PATCH v8 1/2] Extend PageIsVerified() to handle more custom options This is useful for checks of relation pages without having to load the pages into the shared buffers, and two cases can make use of that: page verification in base backups and the online, lock-safe, flavor. Compatibility is kept with past versions using a routine that calls the new extended routine with the set of options compatible with the original version. Contrary to d401c576, a macro cannot be used as there may be external code relying on the presence of the original routine. This is applied only on REL_13_STABLE, where this will be used by a follow-up commit addressing a set of issues with page verification in base backups. Extracted from a larger patch by the same author. Author: Anastasia Lubennikova Reviewed-by: Michael Paquier, Julien Rouhaud Discussion: https://postgr.es/m/608f3476-0598-2514-2c03-e05c7d2b0...@postgrespro.ru --- src/include/storage/bufpage.h | 6 ++++++ src/backend/storage/buffer/bufmgr.c | 6 ++++-- src/backend/storage/page/bufpage.c | 25 ++++++++++++++++++++----- 3 files changed, 30 insertions(+), 7 deletions(-) diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h index 85dd10c45a..0d21c727ba 100644 --- a/src/include/storage/bufpage.h +++ b/src/include/storage/bufpage.h @@ -406,9 +406,14 @@ do { \ * extern declarations * ---------------------------------------------------------------- */ + +/* flags for PageAddItemExtended() */ #define PAI_OVERWRITE (1 << 0) #define PAI_IS_HEAP (1 << 1) +/* flags for PageIsVerifiedExtended() */ +#define PIV_LOG_WARNING (1 << 0) + #define PageAddItem(page, item, size, offsetNumber, overwrite, is_heap) \ PageAddItemExtended(page, item, size, offsetNumber, \ ((overwrite) ? PAI_OVERWRITE : 0) | \ @@ -416,6 +421,7 @@ do { \ extern void PageInit(Page page, Size pageSize, Size specialSize); extern bool PageIsVerified(Page page, BlockNumber blkno); +extern bool PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags); extern OffsetNumber PageAddItemExtended(Page page, Item item, Size size, OffsetNumber offsetNumber, int flags); extern Page PageGetTempPage(Page page); diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 01eabe5706..8708ba35c3 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -612,7 +612,8 @@ ReadBuffer(Relation reln, BlockNumber blockNum) * * In RBM_NORMAL mode, the page is read from disk, and the page header is * validated. An error is thrown if the page header is not valid. (But - * note that an all-zero page is considered "valid"; see PageIsVerified().) + * note that an all-zero page is considered "valid"; see + * PageIsVerifiedExtended().) * * RBM_ZERO_ON_ERROR is like the normal mode, but if the page header is not * valid, the page is zeroed instead of throwing an error. This is intended @@ -898,7 +899,8 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum, } /* check for garbage data */ - if (!PageIsVerified((Page) bufBlock, blockNum)) + if (!PageIsVerifiedExtended((Page) bufBlock, blockNum, + PIV_LOG_WARNING)) { if (mode == RBM_ZERO_ON_ERROR || zero_damaged_pages) { diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index d2ea38bdab..8d788b56ae 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -61,6 +61,17 @@ PageInit(Page page, Size pageSize, Size specialSize) /* * PageIsVerified + * Utility wrapper for PageIsVerifiedExtended(). + */ +bool +PageIsVerified(Page page, BlockNumber blkno) +{ + return PageIsVerifiedExtended(page, blkno, PIV_LOG_WARNING); +} + + +/* + * PageIsVerifiedExtended * Check that the page header and checksum (if any) appear valid. * * This is called when a page has just been read in from disk. The idea is @@ -76,9 +87,12 @@ PageInit(Page page, Size pageSize, Size specialSize) * allow zeroed pages here, and are careful that the page access macros * treat such a page as empty and without free space. Eventually, VACUUM * will clean up such a page and make it usable. + * + * If flag PIV_LOG_WARNING is set, a WARNING is logged in the event of + * a checksum failure. */ bool -PageIsVerified(Page page, BlockNumber blkno) +PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) { PageHeader p = (PageHeader) page; size_t *pagebytes; @@ -146,10 +160,11 @@ PageIsVerified(Page page, BlockNumber blkno) */ if (checksum_failure) { - ereport(WARNING, - (errcode(ERRCODE_DATA_CORRUPTED), - errmsg("page verification failed, calculated checksum %u but expected %u", - checksum, p->pd_checksum))); + if ((flags & PIV_LOG_WARNING) != 0) + ereport(WARNING, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("page verification failed, calculated checksum %u but expected %u", + checksum, p->pd_checksum))); if (header_sane && ignore_checksum_failure) return true; -- 2.29.1
From e8d46fe87e17b699d7422c6f524d14abc8b3b18f Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 30 Oct 2020 10:19:14 +0900 Subject: [PATCH v8 2/2] Fix page verification in base backups --- src/backend/replication/basebackup.c | 177 ++++++++++--------- src/bin/pg_basebackup/t/010_pg_basebackup.pl | 20 ++- 2 files changed, 111 insertions(+), 86 deletions(-) diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 3e53b3df6f..990a8fc22d 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1394,20 +1394,23 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf { FILE *fp; BlockNumber blkno = 0; - bool block_retry = false; + int block_attempts = 0; char buf[TAR_SEND_SIZE]; - uint16 checksum; int checksum_failures = 0; off_t cnt; int i; pgoff_t len = 0; char *page; size_t pad; - PageHeader phdr; int segmentno = 0; char *segmentpath; bool verify_checksum = false; + /* Maximum number of checksum failures reported for this file */ +#define CHECKSUM_REPORT_THRESHOLD 5 + /* maximum number of retries done for a single page */ +#define PAGE_RETRY_THRESHOLD 5 + fp = AllocateFile(readfilename, "rb"); if (fp == NULL) { @@ -1464,9 +1467,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf if (verify_checksum && (cnt % BLCKSZ != 0)) { ereport(WARNING, - (errmsg("cannot verify checksum in file \"%s\", block " - "%d: read buffer size %d and page size %d " - "differ", + (errmsg("could not verify checksum in file \"%s\", block %u: read buffer size %d and page size %d differ", readfilename, blkno, (int) cnt, BLCKSZ))); verify_checksum = false; } @@ -1477,96 +1478,102 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf { page = buf + BLCKSZ * i; + elog(DEBUG1, "checking block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + /* - * Only check pages which have not been modified since the - * start of the base backup. Otherwise, they might have been - * written only halfway and the checksum would not be valid. - * However, replaying WAL would reinstate the correct page in - * this case. We also skip completely new pages, since they - * don't have a checksum yet. + * Check on-disk pages with the same set of verification + * conditions used before loading pages into shared buffers. + * Note that PageIsVerifiedExtended() considers pages filled + * only with zeros as valid, since they don't have a checksum + * yet. Failures are not reported to pgstat yet, as these are + * reported in a single batch once a file is completed. It + * could be possible, that a page is written halfway while + * doing this check, causing a false positive. If that + * happens, a page is retried multiple times, with an error + * reported if the second attempt also fails. */ - if (!PageIsNew(page) && PageGetLSN(page) < startptr) + if (PageIsVerifiedExtended(page, blkno, 0)) { - checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE); - phdr = (PageHeader) page; - if (phdr->pd_checksum != checksum) + /* success, move on to the next block */ + blkno++; + elog(DEBUG1, "check success for block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + block_attempts = 0; + continue; + } + + /* The verification of a page has failed, retry again */ + if (block_attempts < PAGE_RETRY_THRESHOLD) + { + /* Reread the failed block */ + if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1) + { + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not fseek in file \"%s\": %m", + readfilename))); + } + + if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ) { /* - * Retry the block on the first failure. It's - * possible that we read the first 4K page of the - * block just before postgres updated the entire block - * so it ends up looking torn to us. We only need to - * retry once because the LSN should be updated to - * something we can ignore on the next pass. If the - * error happens again then it is a true validation - * failure. + * If we hit end-of-file, a concurrent truncation must + * have occurred, so break out of this loop just as if + * the initial fread() returned 0. We'll drop through + * to the same code that handles that case. (We must + * fix up cnt first, though.) */ - if (block_retry == false) + if (feof(fp)) { - /* Reread the failed block */ - if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1) - { - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not fseek in file \"%s\": %m", - readfilename))); - } - - if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ) - { - /* - * If we hit end-of-file, a concurrent - * truncation must have occurred, so break out - * of this loop just as if the initial fread() - * returned 0. We'll drop through to the same - * code that handles that case. (We must fix - * up cnt first, though.) - */ - if (feof(fp)) - { - cnt = BLCKSZ * i; - break; - } - - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not reread block %d of file \"%s\": %m", - blkno, readfilename))); - } - - if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1) - { - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not fseek in file \"%s\": %m", - readfilename))); - } - - /* Set flag so we know a retry was attempted */ - block_retry = true; - - /* Reset loop to validate the block again */ - i--; - continue; + cnt = BLCKSZ * i; + break; } - checksum_failures++; - - if (checksum_failures <= 5) - ereport(WARNING, - (errmsg("checksum verification failed in " - "file \"%s\", block %d: calculated " - "%X but expected %X", - readfilename, blkno, checksum, - phdr->pd_checksum))); - if (checksum_failures == 5) - ereport(WARNING, - (errmsg("further checksum verification " - "failures in file \"%s\" will not " - "be reported", readfilename))); + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not reread block %d of file \"%s\": %m", + blkno, readfilename))); } + + + if (fseek(fp, cnt - BLCKSZ * i - BLCKSZ, SEEK_CUR) == -1) + { + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not fseek in file \"%s\": %m", + readfilename))); + } + + /* Set the counter so we know a retry was attempted */ + block_attempts++; + + elog(DEBUG1, "check failure for block %u of file %s, attempt %d", + blkno, readfilename, block_attempts); + + /* + * Wait for 100 ms to give some room to any parallel page + * write that may have caused this retry to finish. + */ + pg_usleep(100000L); + + /* Reset loop to validate the block again */ + i--; + continue; } - block_retry = false; + + checksum_failures++; + if (checksum_failures <= CHECKSUM_REPORT_THRESHOLD) + ereport(WARNING, + (errmsg("checksum verification failed in block %u of file \"%s\"", + blkno, readfilename))); + if (checksum_failures == CHECKSUM_REPORT_THRESHOLD) + ereport(WARNING, + (errmsg("further checksum verification failures in file \"%s\" will not be reported", + readfilename))); + + /* move to next block */ + block_attempts = 0; blkno++; } } diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl index 851a7db52f..b62b32399a 100644 --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl @@ -6,7 +6,7 @@ use File::Basename qw(basename dirname); use File::Path qw(rmtree); use PostgresNode; use TestLib; -use Test::More tests => 107; +use Test::More tests => 110; program_help_ok('pg_basebackup'); program_version_ok('pg_basebackup'); @@ -514,6 +514,24 @@ $node->command_checks_all( 'pg_basebackup reports checksum mismatch'); rmtree("$tempdir/backup_corrupt"); +# Create a new type of corruption and zero out one page header +# completely. +open $file, '+<', "$pgdata/$file_corrupt1"; +seek($file, $pageheader_size, 0); +system_or_bail 'pg_ctl', '-D', $pgdata, 'stop'; +my $zero_data = '\0' x $pageheader_size; +syswrite($file, $zero_data); +close $file; +system_or_bail 'pg_ctl', '-D', $pgdata, 'start'; + +$node->command_checks_all( + [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt1a" ], + 1, + [qr{^$}], + [qr/^WARNING.*checksum verification failed/s], + "pg_basebackup reports checksum mismatch for zeroed page headers"); +rmtree("$tempdir/backup_corrupt1a"); + # induce further corruption in 5 more blocks system_or_bail 'pg_ctl', '-D', $pgdata, 'stop'; open $file, '+<', "$pgdata/$file_corrupt1"; -- 2.29.1
signature.asc
Description: PGP signature