Hi Michael, On 3/17/18 5:34 PM, Michael Banck wrote: > > On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote: > > I think most people (including those I had off-list discussions about > this with) were of the opinion that such an option should be there, so I > added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP > replication command and also an option -k / --no-verify-checksums to > pg_basebackup to trigger this. > > Updated patch attached.
+ memcpy(page, (buf + BLCKSZ * i), BLCKSZ); Why make a copy here? How about: char *page = buf + BLCKSZ * i I know pg_checksum_page manipulates the checksum field but I have found it to be safe. + if (phdr->pd_checksum != checksum) I've attached a patch that adds basic retry functionality. It's not terrible efficient since it rereads the entire buffer for any block error. A better way is to keep a bitmap for each block in the buffer, then on retry compare bitmaps. If the block is still bad, report it. If the block was corrected moved on. If a block was good before but is bad on retry it can be ignored. + ereport(WARNING, + (errmsg("checksum verification failed in file " I'm worried about how verbose this warning could be since there are 131,072 blocks per segment. It's unlikely to have that many block errors, but users do sometimes put files in PGDATA which look like they should be validated. Since these warnings all go to the server log it could get pretty bad. We should either stop warning after the first failure, or aggregate the failures for a file into a single message. Some tests with multi-block errors should be added to test these scenarios. Thanks, -- -David da...@pgmasters.net
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index 9f735a2c07..48bacfe5dd 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -1258,6 +1258,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf int i; pgoff_t len = 0; char page[BLCKSZ]; + int block_error = -1; size_t pad; PageHeader phdr; int segmentno = 0; @@ -1328,9 +1329,16 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf verify_checksum = false; continue; } - for (i = 0; i < cnt / BLCKSZ; i++) + + /* + * Validate all blocks in the buffer. If checking after the buffer + * was reread then start at the block that caused the reread. + */ + for (i = block_error == -1 ? 0 : block_error; i < cnt / BLCKSZ; i++) { + blkno = len / BLCKSZ + i; memcpy(page, (buf + BLCKSZ * i), BLCKSZ); + /* * Only check pages which have not been modified since the * start of the base backup. @@ -1341,6 +1349,18 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf phdr = (PageHeader) page; if (phdr->pd_checksum != checksum) { + /* + * If first time encountering an error on this block + * then read the last buffer again to see if the + * checksum is corrected. + */ + if (block_error == -1) + { + block_error = i; + fseek(fp, -cnt, SEEK_CUR); + break; + } + ereport(WARNING, (errmsg("checksum verification failed in file " "\"%s\", block %d: calculated %X but " @@ -1350,8 +1370,12 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf checksum_failure = true; } } - blkno++; + block_error = -1; } + + /* Read the buffer again if there were checksum errors */ + if (block_error != -1) + continue; } /* Send the chunk as a CopyData message */