On Thu, Mar 10, 2016 at 4:25 AM, Andres Freund wrote:
> I've finally pushed these, after making a number of mostly cosmetic
> fixes. The only of real consequence is that I've removed the durable_*
> call from the renames to .deleted in xlog[archive].c - these don't need
> to be durable, and are windows only. Oh, and that there was a typo in
> the !HAVE_WORKING_LINK case.
>
> There's a *lot* of version skew here: not-present functionality, moved
> files, different APIs - we got it all.  I've tried to check in each
> version whether we're missing fsyncs for renames and everything.
> Michael, *please* double check the diffs for the different branches.

I have finally been able to spend some time reviewing what you pushed
on back-branches, and things are in correct shape I think. One small
issue that I have is that for EXEC_BACKEND builds, in
write_nondefault_variables we still use one instance of rename(). I
cannot really believe that there are production builds of Postgres
with EXEC_BACKEND on non-Windows platforms, but I think that we had
better cover our backs in this code path. For the other extra 2 calls
of rename() in xlog.c and xlogarchive.c, those are fine untouched I
think there is no need to care about WIN32 blocks...

> Note that we currently have some frontend programs with the equivalent
> problem. Most importantly receivelog.c (pg_basebackup/pg_recveivexlog)
> are missing pretty much the same directory fsyncs.  And at least for
> pg_recvxlog it's critical, especially now that receivexlog support
> syncrep.  I've not done anything about that; there's pretty much no
> chance to share backend code with the frontend in the back-branches.

Yeah, true. We definitely need to do something for that, even for HEAD
it seems like an overkill to have something in for example src/common
to allow frontends to have something if the fix is localized
(pg_rewind may use something else), and it would be nice to finish
wrapping that for the next minor release, so I propose the attached
patches. At the same time, I think that adminpack had better be fixed
as well, so there are actually three patches in this series, things
that I shaped thinking about a backpatch btw, particularly for 0002.
-- 
Michael
From 1a6a73565c36bac45b84ddf1b9718062c13d69cd Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@otacoo.com>
Date: Tue, 15 Mar 2016 14:50:48 +0100
Subject: [PATCH 1/3] Make rename() calls for log files in adminpack durable

As mentioned in 1d4a0ab1, rename() is not a durable operation in case
of crashes, causing renames to be potentially lost in such unfortunate
scenarios. The functions of adminpack are not critical code paths so they
do not induce any data loss, still it may be annoying for the upper
application layer like pgadmin to see inconsistent log files should a
server restart after a crash.
---
 contrib/adminpack/adminpack.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/contrib/adminpack/adminpack.c b/contrib/adminpack/adminpack.c
index ea781a0..9136b79 100644
--- a/contrib/adminpack/adminpack.c
+++ b/contrib/adminpack/adminpack.c
@@ -209,44 +209,27 @@ pg_file_rename(PG_FUNCTION_ARGS)
 						fn3 ? fn3 : fn2)));
 	}
 
+	/*
+	 * Should a third file name be defined, use it as a temporary switch
+	 * that allows reverting back to the initial point should an error
+	 * occur.
+	 */
 	if (fn3)
 	{
-		if (rename(fn2, fn3) != 0)
-		{
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not rename \"%s\" to \"%s\": %m",
-							fn2, fn3)));
-		}
-		if (rename(fn1, fn2) != 0)
-		{
-			ereport(WARNING,
-					(errcode_for_file_access(),
-					 errmsg("could not rename \"%s\" to \"%s\": %m",
-							fn1, fn2)));
+		/* durable_rename produces already a log entry */
+		durable_rename(fn2, fn3, ERROR);
 
-			if (rename(fn3, fn2) != 0)
-			{
-				ereport(ERROR,
-						(errcode_for_file_access(),
-						 errmsg("could not rename \"%s\" back to \"%s\": %m",
-								fn3, fn2)));
-			}
-			else
-			{
+		if (durable_rename(fn1, fn2, WARNING) != 0)
+		{
+			if (durable_rename(fn3, fn2, ERROR) == 0)
 				ereport(ERROR,
 						(ERRCODE_UNDEFINED_FILE,
 						 errmsg("renaming \"%s\" to \"%s\" was reverted",
 								fn2, fn3)));
-			}
 		}
 	}
-	else if (rename(fn1, fn2) != 0)
-	{
-		ereport(ERROR,
-				(errcode_for_file_access(),
-				 errmsg("could not rename \"%s\" to \"%s\": %m", fn1, fn2)));
-	}
+	else
+		durable_rename(fn1, fn2, ERROR);
 
 	PG_RETURN_BOOL(true);
 }
-- 
2.7.3

From 6d90b86685aa5cf3010d1edb3d91f8d0144c252f Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@otacoo.com>
Date: Tue, 15 Mar 2016 15:34:04 +0100
Subject: [PATCH 2/3] Avoid potential data loss in pg_receivexlog

pg_receivexlog makes use of rename() for timeline history files as well
as for completed WAL segments. However, this is not reliable and may cause
the rename operation to be lost in case of crashes. This commit makes use
of a similar function to backend's durable_name to make the renaming operation
durable on disk.
---
 src/bin/pg_basebackup/receivelog.c | 93 +++++++++++++++++++++++++++++++++++---
 1 file changed, 87 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_basebackup/receivelog.c b/src/bin/pg_basebackup/receivelog.c
index 595213f..b809582 100644
--- a/src/bin/pg_basebackup/receivelog.c
+++ b/src/bin/pg_basebackup/receivelog.c
@@ -50,6 +50,7 @@ static long CalculateCopyStreamSleeptime(int64 now, int standby_message_timeout,
 
 static bool ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos,
 						 uint32 *timeline);
+static int	durable_rename(const char *oldfile, const char *newfile);
 
 static bool
 mark_file_as_archived(const char *basedir, const char *fname)
@@ -217,10 +218,9 @@ close_walfile(StreamCtl *stream, XLogRecPtr pos)
 
 		snprintf(oldfn, sizeof(oldfn), "%s/%s%s", stream->basedir, current_walfile_name, stream->partial_suffix);
 		snprintf(newfn, sizeof(newfn), "%s/%s", stream->basedir, current_walfile_name);
-		if (rename(oldfn, newfn) != 0)
+		if (durable_rename(oldfn, newfn) != 0)
 		{
-			fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"),
-					progname, current_walfile_name, strerror(errno));
+			/* durable_rename produced a log entry */
 			return false;
 		}
 	}
@@ -356,10 +356,9 @@ writeTimeLineHistoryFile(StreamCtl *stream, char *filename, char *content)
 	/*
 	 * Now move the completed history file into place with its final name.
 	 */
-	if (rename(tmppath, path) < 0)
+	if (durable_rename(tmppath, path) < 0)
 	{
-		fprintf(stderr, _("%s: could not rename file \"%s\" to \"%s\": %s\n"),
-				progname, tmppath, path, strerror(errno));
+		/* durable_rename produced a log entry */
 		return false;
 	}
 
@@ -786,6 +785,88 @@ ReadEndOfStreamingResult(PGresult *res, XLogRecPtr *startpos, uint32 *timeline)
 }
 
 /*
+ * Wrapper of rename() similar to the backend version with the same function
+ * name aimed at making the renaming durable on disk. Note that this version
+ * does not fsync the old file before the rename as all the code paths leading
+ * to this function are already doing this operation. The new file is also
+ * normally not present on disk before the renaming so there is no need to
+ * bother about it.
+ */
+static int
+durable_rename(const char *oldfile, const char *newfile)
+{
+	int		fd;
+	char	parentpath[MAXPGPATH];
+
+	if (rename(oldfile, newfile) != 0)
+	{
+		/* durable_rename produced a log entry */
+		fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"),
+				progname, current_walfile_name, strerror(errno));
+		return -1;
+	}
+
+	/*
+	 * To guarantee renaming of the file is persistent, fsync the file with its
+	 * new name, and its containing directory.
+	 */
+	fd = open(newfile, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
+	if (fd < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
+				progname, newfile, strerror(errno));
+		return -1;
+	}
+
+	if (fsync(fd) != 0)
+	{
+		close(fd);
+		fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+				progname, newfile, strerror(errno));
+		return -1;
+	}
+	close(fd);
+
+	strlcpy(parentpath, newfile, MAXPGPATH);
+	get_parent_directory(parentpath);
+
+	/*
+	 * get_parent_directory() returns an empty string if the input argument is
+	 * just a file name (see comments in path.c), so handle that as being the
+	 * current directory.
+	 */
+	if (strlen(parentpath) == 0)
+		strlcpy(parentpath, ".", MAXPGPATH);
+
+	fd = open(parentpath, O_RDONLY | PG_BINARY, S_IRUSR | S_IWUSR);
+
+	/*
+	 * Some OSs don't allow us to open directories at all (Windows returns
+	 * EACCES), just ignore the error in that case.  If desired also silently
+	 * ignoring errors about unreadable files. Log others.
+	 */
+	if (fd < 0 && (errno == EISDIR || errno == EACCES))
+		return 0;
+	else if (fd < 0)
+	{
+		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
+				progname, parentpath, strerror(errno));
+		return -1;
+	}
+
+	if (fsync(fd) != 0)
+	{
+		close(fd);
+		fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
+				progname, parentpath, strerror(errno));
+		return -1;
+	}
+	close(fd);
+
+	return 0;
+}
+
+/*
  * The main loop of ReceiveXlogStream. Handles the COPY stream after
  * initiating streaming with the START_STREAMING command.
  *
-- 
2.7.3

From ec81809e7b373feacd91418e850c44757fbaa1fe Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@otacoo.com>
Date: Tue, 15 Mar 2016 15:37:13 +0100
Subject: [PATCH 3/3] Avoid potential lost rename() of new parameter file in
 EXEC_BACKEND builds

guc.c is making use of rename(), though there are risks to lose the renaming
in case of a crash. Hence make use of durable_rename to make things durable.
---
 src/backend/utils/misc/guc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index edcafce..79e52d8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -8656,7 +8656,7 @@ write_nondefault_variables(GucContext context)
 	 * Put new file in place.  This could delay on Win32, but we don't hold
 	 * any exclusive locks.
 	 */
-	rename(CONFIG_EXEC_PARAMS_NEW, CONFIG_EXEC_PARAMS);
+	(void) durable_rename(CONFIG_EXEC_PARAMS_NEW, CONFIG_EXEC_PARAMS, DEBUG1);
 }
 
 
-- 
2.7.3

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to