At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote:
>
> On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote:
> > 
> > 1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to
> >    earlier in StartupXLOG.
> > 
> > 2. Inside that function, issue fsync()s for the main forks we create by
> >    copying the _init fork.
> 
> These two imo should definitely be backpatched.

I've attached updated versions of these two patches. The only changes
are to revise the comments as you suggested.

> > 3. A small fixup to add a const to "typedef char *FileName", because the
> >    earlier patch gave me warnings about discarding const-ness. This is
> >    consistent with many other functions in fd.c that take const char *.
> 
> I'm happy with consider that one for master (although I seem to recall
> having had a patch for it rejected?), but I don't think we want to do
> that in the backbranches.

(I gave up on this for now as an unrelated diversion.)

> > 4. Issue an fsync() on the data directory at startup if we need to
> > perform crash recovery.
> 
> I personally think this one should be backpatched too. Does anyone
> believe differently?

I revised this patch as well, I'll rebase and send it separately along
with the initdb -S patch shortly.

-- Abhijit
>From 8487185a5f2073ed475f971e6203d49e1e21a987 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen <a...@2ndquadrant.com>
Date: Wed, 8 Oct 2014 11:48:25 +0530
Subject: Call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier

We need to call this after recovery, but not after the END_OF_RECOVERY
checkpoint is written. If we crash after that checkpoint, for example
because of ENOSPC in PreallocXlogFiles, the checkpoint has already set
the ControlFile->state to DB_SHUTDOWNED, so we don't enter recovery
again at startup.

Because we did call ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP)
in the first cleanup, this leaves the database with a bunch of _init
forks for unlogged relations, but no main forks, leading to scary
errors.

See thread from 20140912112246.ga4...@alap3.anarazel.de for details.
---
 src/backend/access/transam/xlog.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3c9aeae..2ab9da5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6871,6 +6871,16 @@ StartupXLOG(void)
 	ShutdownWalRcv();
 
 	/*
+	 * Reset unlogged relations to the contents of their INIT fork. This
+	 * is done AFTER recovery is complete so as to include any unlogged
+	 * relations created during recovery, but BEFORE recovery is marked
+	 * as having completed successfully (because this step can fail,
+	 * e.g. due to ENOSPC).
+	 */
+	if (InRecovery)
+		ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
+
+	/*
 	 * We don't need the latch anymore. It's not strictly necessary to disown
 	 * it, but let's do it for the sake of tidiness.
 	 */
@@ -7138,14 +7148,6 @@ StartupXLOG(void)
 	PreallocXlogFiles(EndOfLog);
 
 	/*
-	 * Reset initial contents of unlogged relations.  This has to be done
-	 * AFTER recovery is complete so that any unlogged relations created
-	 * during recovery also get picked up.
-	 */
-	if (InRecovery)
-		ResetUnloggedRelations(UNLOGGED_RELATION_INIT);
-
-	/*
 	 * Okay, we're officially UP.
 	 */
 	InRecovery = false;
-- 
1.9.1

>From 944e7c4e27fca5589b8a103f7f470df23a5161c2 Mon Sep 17 00:00:00 2001
From: Abhijit Menon-Sen <a...@2ndquadrant.com>
Date: Wed, 24 Sep 2014 16:01:37 +0530
Subject: =?UTF-8?q?Make=20ResetUnloggedRelations(=E2=80=A6=5FINIT)=20fsync?=
 =?UTF-8?q?=20newly-created=20main=20forks?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

During UNLOGGED_RELATION_INIT, after we have copied ${x}_init to $x, we
issue fsync()s for the newly-created files. We depend on their existence
and a checkpoint isn't going to fsync them for us during recovery.

See thread from 20140912112246.ga4...@alap3.anarazel.de for details.
---
 src/backend/storage/file/reinit.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c
index 3229f41..4ad5987 100644
--- a/src/backend/storage/file/reinit.c
+++ b/src/backend/storage/file/reinit.c
@@ -339,6 +339,44 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op)
 		}
 
 		FreeDir(dbspace_dir);
+
+		/*
+		 * copy_file() above has already called pg_flush_data() on the
+		 * files it created. Now we need to fsync those files, because
+		 * a checkpoint won't do it for us while we're in recovery. We
+		 * do this in a separate pass to allow the kernel to perform
+		 * all the flushes at once.
+		 */
+
+		dbspace_dir = AllocateDir(dbspacedirname);
+		while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL)
+		{
+			ForkNumber forkNum;
+			int			oidchars;
+			char		oidbuf[OIDCHARS + 1];
+			char		mainpath[MAXPGPATH];
+
+			/* Skip anything that doesn't look like a relation data file. */
+			if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars,
+													 &forkNum))
+				continue;
+
+			/* Also skip it unless this is the init fork. */
+			if (forkNum != INIT_FORKNUM)
+				continue;
+
+			/* Construct main fork pathname. */
+			memcpy(oidbuf, de->d_name, oidchars);
+			oidbuf[oidchars] = '\0';
+			snprintf(mainpath, sizeof(mainpath), "%s/%s%s",
+					 dbspacedirname, oidbuf, de->d_name + oidchars + 1 +
+					 strlen(forkNames[INIT_FORKNUM]));
+
+			fsync_fname(mainpath, false);
+		}
+		FreeDir(dbspace_dir);
+
+		fsync_fname((char *)dbspacedirname, true);
 	}
 }
 
-- 
1.9.1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to