Hi Michael,

On 3/17/18 5:34 PM, Michael Banck wrote:
> 
> On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote:
> 
> I think most people (including those I had off-list discussions about
> this with) were of the opinion that such an option should be there, so I
> added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP
> replication command and also an option -k / --no-verify-checksums to
> pg_basebackup to trigger this.
> 
> Updated patch attached.

+    memcpy(page, (buf + BLCKSZ * i), BLCKSZ);

Why make a copy here?  How about:

char *page = buf + BLCKSZ * i

I know pg_checksum_page manipulates the checksum field but I have found
it to be safe.

+    if (phdr->pd_checksum != checksum)

I've attached a patch that adds basic retry functionality.  It's not
terrible efficient since it rereads the entire buffer for any block
error.  A better way is to keep a bitmap for each block in the buffer,
then on retry compare bitmaps.  If the block is still bad, report it.
If the block was corrected moved on.  If a block was good before but is
bad on retry it can be ignored.

+    ereport(WARNING,
+        (errmsg("checksum verification failed in file "

I'm worried about how verbose this warning could be since there are
131,072 blocks per segment.  It's unlikely to have that many block
errors, but users do sometimes put files in PGDATA which look like they
should be validated.  Since these warnings all go to the server log it
could get pretty bad.

We should either stop warning after the first failure, or aggregate the
failures for a file into a single message.

Some tests with multi-block errors should be added to test these scenarios.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/src/backend/replication/basebackup.c 
b/src/backend/replication/basebackup.c
index 9f735a2c07..48bacfe5dd 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1258,6 +1258,7 @@ sendFile(const char *readfilename, const char 
*tarfilename, struct stat *statbuf
        int                     i;
        pgoff_t         len = 0;
        char            page[BLCKSZ];
+       int                     block_error = -1;
        size_t          pad;
        PageHeader  phdr;
        int                     segmentno = 0;
@@ -1328,9 +1329,16 @@ sendFile(const char *readfilename, const char 
*tarfilename, struct stat *statbuf
                                verify_checksum = false;
                                continue;
                        }
-                       for (i = 0; i < cnt / BLCKSZ; i++)
+
+                       /*
+                        * Validate all blocks in the buffer.  If checking 
after the buffer
+                        * was reread then start at the block that caused the 
reread.
+             */
+                       for (i = block_error == -1 ? 0 : block_error; i < cnt / 
BLCKSZ; i++)
                        {
+                               blkno = len / BLCKSZ + i;
                                memcpy(page, (buf + BLCKSZ * i), BLCKSZ);
+
                                /*
                                 * Only check pages which have not been 
modified since the
                                 * start of the base backup.
@@ -1341,6 +1349,18 @@ sendFile(const char *readfilename, const char 
*tarfilename, struct stat *statbuf
                                        phdr = (PageHeader) page;
                                        if (phdr->pd_checksum != checksum)
                                        {
+                                               /*
+                                                * If first time encountering 
an error on this block
+                                                * then read the last buffer 
again to see if the
+                                                * checksum is corrected.
+                                                */
+                                               if (block_error == -1)
+                                               {
+                                                       block_error = i;
+                                                       fseek(fp, -cnt, 
SEEK_CUR);
+                                                       break;
+                                               }
+
                                                ereport(WARNING,
                                                                
(errmsg("checksum verification failed in file "
                                                                                
"\"%s\", block %d: calculated %X but "
@@ -1350,8 +1370,12 @@ sendFile(const char *readfilename, const char 
*tarfilename, struct stat *statbuf
                                                checksum_failure = true;
                                        }
                                }
-                               blkno++;
+                               block_error = -1;
                        }
+
+                       /* Read the buffer again if there were checksum errors 
*/
+                       if (block_error != -1)
+                               continue;
                }
 
                /* Send the chunk as a CopyData message */

Reply via email to