At 2014-09-25 22:41:18 +0200, and...@2ndquadrant.com wrote: > > On 2014-09-24 17:06:05 +0530, Abhijit Menon-Sen wrote: > > > > 1. Move the call to ResetUnloggedRelations(UNLOGGED_RELATION_INIT) to > > earlier in StartupXLOG. > > > > 2. Inside that function, issue fsync()s for the main forks we create by > > copying the _init fork. > > These two imo should definitely be backpatched.
I've attached updated versions of these two patches. The only changes are to revise the comments as you suggested. > > 3. A small fixup to add a const to "typedef char *FileName", because the > > earlier patch gave me warnings about discarding const-ness. This is > > consistent with many other functions in fd.c that take const char *. > > I'm happy with consider that one for master (although I seem to recall > having had a patch for it rejected?), but I don't think we want to do > that in the backbranches. (I gave up on this for now as an unrelated diversion.) > > 4. Issue an fsync() on the data directory at startup if we need to > > perform crash recovery. > > I personally think this one should be backpatched too. Does anyone > believe differently? I revised this patch as well, I'll rebase and send it separately along with the initdb -S patch shortly. -- Abhijit
>From 8487185a5f2073ed475f971e6203d49e1e21a987 Mon Sep 17 00:00:00 2001 From: Abhijit Menon-Sen <a...@2ndquadrant.com> Date: Wed, 8 Oct 2014 11:48:25 +0530 Subject: Call ResetUnloggedRelations(UNLOGGED_RELATION_INIT) earlier We need to call this after recovery, but not after the END_OF_RECOVERY checkpoint is written. If we crash after that checkpoint, for example because of ENOSPC in PreallocXlogFiles, the checkpoint has already set the ControlFile->state to DB_SHUTDOWNED, so we don't enter recovery again at startup. Because we did call ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP) in the first cleanup, this leaves the database with a bunch of _init forks for unlogged relations, but no main forks, leading to scary errors. See thread from 20140912112246.ga4...@alap3.anarazel.de for details. --- src/backend/access/transam/xlog.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3c9aeae..2ab9da5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6871,6 +6871,16 @@ StartupXLOG(void) ShutdownWalRcv(); /* + * Reset unlogged relations to the contents of their INIT fork. This + * is done AFTER recovery is complete so as to include any unlogged + * relations created during recovery, but BEFORE recovery is marked + * as having completed successfully (because this step can fail, + * e.g. due to ENOSPC). + */ + if (InRecovery) + ResetUnloggedRelations(UNLOGGED_RELATION_INIT); + + /* * We don't need the latch anymore. It's not strictly necessary to disown * it, but let's do it for the sake of tidiness. */ @@ -7138,14 +7148,6 @@ StartupXLOG(void) PreallocXlogFiles(EndOfLog); /* - * Reset initial contents of unlogged relations. This has to be done - * AFTER recovery is complete so that any unlogged relations created - * during recovery also get picked up. - */ - if (InRecovery) - ResetUnloggedRelations(UNLOGGED_RELATION_INIT); - - /* * Okay, we're officially UP. */ InRecovery = false; -- 1.9.1
>From 944e7c4e27fca5589b8a103f7f470df23a5161c2 Mon Sep 17 00:00:00 2001 From: Abhijit Menon-Sen <a...@2ndquadrant.com> Date: Wed, 24 Sep 2014 16:01:37 +0530 Subject: =?UTF-8?q?Make=20ResetUnloggedRelations(=E2=80=A6=5FINIT)=20fsync?= =?UTF-8?q?=20newly-created=20main=20forks?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit During UNLOGGED_RELATION_INIT, after we have copied ${x}_init to $x, we issue fsync()s for the newly-created files. We depend on their existence and a checkpoint isn't going to fsync them for us during recovery. See thread from 20140912112246.ga4...@alap3.anarazel.de for details. --- src/backend/storage/file/reinit.c | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/src/backend/storage/file/reinit.c b/src/backend/storage/file/reinit.c index 3229f41..4ad5987 100644 --- a/src/backend/storage/file/reinit.c +++ b/src/backend/storage/file/reinit.c @@ -339,6 +339,44 @@ ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname, int op) } FreeDir(dbspace_dir); + + /* + * copy_file() above has already called pg_flush_data() on the + * files it created. Now we need to fsync those files, because + * a checkpoint won't do it for us while we're in recovery. We + * do this in a separate pass to allow the kernel to perform + * all the flushes at once. + */ + + dbspace_dir = AllocateDir(dbspacedirname); + while ((de = ReadDir(dbspace_dir, dbspacedirname)) != NULL) + { + ForkNumber forkNum; + int oidchars; + char oidbuf[OIDCHARS + 1]; + char mainpath[MAXPGPATH]; + + /* Skip anything that doesn't look like a relation data file. */ + if (!parse_filename_for_nontemp_relation(de->d_name, &oidchars, + &forkNum)) + continue; + + /* Also skip it unless this is the init fork. */ + if (forkNum != INIT_FORKNUM) + continue; + + /* Construct main fork pathname. */ + memcpy(oidbuf, de->d_name, oidchars); + oidbuf[oidchars] = '\0'; + snprintf(mainpath, sizeof(mainpath), "%s/%s%s", + dbspacedirname, oidbuf, de->d_name + oidchars + 1 + + strlen(forkNames[INIT_FORKNUM])); + + fsync_fname(mainpath, false); + } + FreeDir(dbspace_dir); + + fsync_fname((char *)dbspacedirname, true); } } -- 1.9.1
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers