On Tue, Aug 27, 2019 at 10:33 PM Robert Haas <robertmh...@gmail.com> wrote:

> While reviewing a proposed patch to basebackup.c this morning, I found
> myself a bit underwhelmed by the quality of the code and comments in
> basebackup.c's sendFile(). I believe it's already been pointed out
> that the the retry logic here is not particularly correct, and the
> comments demonstrate a pretty clear lack of understanding of how the
> actual race conditions that are possible here might operate.
>
> However, I then noticed another problem which I think is significantly
> more serious: this code totally ignores the possibility of a failure
> in fread().  It just assumes that if fread() fails to return a
> positive value, it's reached the end of the file, and if that happens,
> it just pads out the rest of the file with zeroes.  So it looks to me
> like if fread() encountered, say, an I/O error while trying to read a
> data file, and if that error were on the first data block, then the
> entire contents of that file would be replaced with zero bytes in the
> backup, and no error or warning of any kind would be given to the
> user.  If it hit the error later, everything from the point of the
> error onward would just get replaced with zero bytes.  To be clear, I
> think it is fine for basebackup.c to fill out the rest of the expected
> length with zeroes if in fact the file has been truncated during the
> backup; recovery should fix it.  But recovery is not going to fix data
> that got eaten because EIO was encountered while copying a file.
>

Per fread(3) manpage, we cannot distinguish between an error or EOF. So to
check for any error we must use ferror() after fread(). Attached patch which
tests ferror() and throws an error if it returns true.  However, I think
fread()/ferror() both do not set errno (per some web reference) and thus
throwing a generic error here. I have manually tested it.


> The logic that rereads a block appears to have the opposite problem.
> Here, it will detect an error, but it will also fail in the face of a
> concurrent truncation, which it shouldn't.
>

For this, I have checked for feof() assuming that when the file gets
truncated
we reach EOF. And if yes, getting out of the loop instead of throwing an
error.
I may be wrong as I couldn't reproduce this issue.

Please have a look over the patch and let me know your views.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

-- 
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index c91f66d..5e64acf 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -90,6 +90,18 @@ static char *statrelpath = NULL;
  */
 #define THROTTLING_FREQUENCY	8
 
+/*
+ * Checks whether we encountered any error in fread().  fread() doesn't give
+ * any clue what has happened, so we check with ferror().  Also, neither
+ * fread() nor ferror() set errno, so we just throw a generic error.
+ */
+#define CHECK_FREAD_ERROR(fp, filename) \
+do { \
+	if (ferror(fp)) \
+		ereport(ERROR, \
+				(errmsg("could not read from file \"%s\"", filename))); \
+} while (0)
+
 /* The actual number of bytes, transfer of which may cause sleep. */
 static uint64 throttling_sample;
 
@@ -543,6 +555,9 @@ perform_base_backup(basebackup_options *opt)
 					break;
 			}
 
+			/* Check fread() error. */
+			CHECK_FREAD_ERROR(fp, pathbuf);
+
 			if (len != wal_segment_size)
 			{
 				CheckXLogRemoved(segno, tli);
@@ -1478,6 +1493,18 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 							if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
 							{
+								/*
+								 * If file is truncated, then we will hit
+								 * end-of-file error in which case we don't
+								 * want to error out, instead just pad it with
+								 * zeros.
+								 */
+								if (feof(fp))
+								{
+									cnt = BLCKSZ * i;
+									break;
+								}
+
 								ereport(ERROR,
 										(errcode_for_file_access(),
 										 errmsg("could not reread block %d of file \"%s\": %m",
@@ -1529,7 +1556,7 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		len += cnt;
 		throttle(cnt);
 
-		if (len >= statbuf->st_size)
+		if (feof(fp) || len >= statbuf->st_size)
 		{
 			/*
 			 * Reached end of file. The file could be longer, if it was
@@ -1540,6 +1567,9 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		}
 	}
 
+	/* Check fread() error. */
+	CHECK_FREAD_ERROR(fp, readfilename);
+
 	/* If the file was truncated while we were sending it, pad it with zeros */
 	if (len < statbuf->st_size)
 	{

Reply via email to