Mmm. I posted to wrong thread. Sorry. At Tue, 23 Apr 2019 16:39:49 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote in <20190423.163949.36763221.horiguchi.kyot...@lab.ntt.co.jp> > At Tue, 23 Apr 2019 13:31:58 +0800, Paul Guo <p...@pivotal.io> wrote in > <caeet0zecwz57z2yfwrds43b3tfqppdswmbjgmd43xrxlt41...@mail.gmail.com> > > Hi Kyotaro, ignoring the MakePGDirectory() failure will fix this database > > create redo error, but I suspect some other kind of redo, which depends on > > the files under the directory (they are not copied since the directory is > > not created) and also cannot be covered by the invalid page mechanism, > > could fail. Thanks. > > If recovery starts from just after tablespace creation, that's > simple. The Symlink to the removed tablespace is already removed > in the case. Hence server innocently create files directly under > pg_tblspc, not in the tablespace. Finally all files that were > supposed to be created in the removed tablespace are removed > later in recovery. > > If recovery starts from recalling page in a file that have been > in the tablespace, XLogReadBufferExtended creates one (perhaps > directly in pg_tblspc as described above) and the files are > removed later in recoery the same way to above. This case doen't > cause FATAL/PANIC during recovery even in master. > > XLogReadBufferExtended@xlogutils.c > | * Create the target file if it doesn't already exist. This lets us cope > | * if the replay sequence contains writes to a relation that is later > | * deleted. (The original coding of this routine would instead suppress > | * the writes, but that seems like it risks losing valuable data if the > | * filesystem loses an inode during a crash. Better to write the data > | * until we are actually told to delete the file.) > > So buffered access cannot be a problem for the reason above. The > remaining possible issue is non-buffered access to files in > removed tablespaces. This is what I mentioned upthread: > > me> but I haven't checked this covers all places where need the same > me> treatment.
RM_DBASE_ID is fixed by the patch. XLOG/XACT/CLOG/MULTIXACT/RELMAP/STANDBY/COMMIT_TS/REPLORIGIN/LOGICALMSG: - are not relevant. HEAP/HEAP2/BTREE/HASH/GIN/GIST/SEQ/SPGIST/BRIN/GENERIC: - Resources works on buffer is not affected. SMGR: - Both CREATE and TRUNCATE seems fine. TBLSPC: - We don't nest tablespace directories. No Problem. I don't find a similar case. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From bc97e195f21af5d715d85424efc21fcbe8bb902c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Mon, 22 Apr 2019 20:59:15 +0900 Subject: [PATCH 4/5] Fix failure of standby startup caused by tablespace removal When standby restarts after a crash after drop of a tablespace, there's a possibility that recovery fails trying an object-creation in already removed tablespace directory. Allow recovery to continue by ignoring the error if not reaching consistency point. --- src/backend/access/transam/xlogutils.c | 34 ++++++++++++++++++++++++++++++++++ src/backend/commands/tablespace.c | 12 ++++++------ src/backend/storage/file/copydir.c | 12 +++++++----- src/include/access/xlogutils.h | 1 + 4 files changed, 48 insertions(+), 11 deletions(-) diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 10a663bae6..75cdb882cd 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -522,6 +522,40 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, return buffer; } +/* + * XLogMakePGDirectory + * + * There is a possibility that WAL replay causes an error by creation of a + * directory under a directory removed before the previous crash. Issuing + * ERROR prevents the caller from continuing recovery. + * + * To prevent that case, this function issues WARNING instead of ERROR on + * error if consistency is not reached yet. + */ +int +XLogMakePGDirectory(const char *directoryName) +{ + int ret; + + ret = MakePGDirectory(directoryName); + + if (ret != 0) + { + int elevel = ERROR; + + /* Don't issue ERROR for this failure before reaching consistency. */ + if (InRecovery && !reachedConsistency) + elevel = WARNING; + + ereport(elevel, + (errcode_for_file_access(), + errmsg("could not create directory \"%s\": %m", directoryName))); + return ret; + } + + return 0; +} + /* * Struct actually returned by XLogFakeRelcacheEntry, though the declared * return type is Relation. diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c index 66a70871e6..c9fb2af015 100644 --- a/src/backend/commands/tablespace.c +++ b/src/backend/commands/tablespace.c @@ -303,12 +303,6 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) (errcode(ERRCODE_INVALID_NAME), errmsg("tablespace location cannot contain single quotes"))); - /* Reject tablespaces in the data directory. */ - if (is_in_data_directory(location)) - ereport(ERROR, - (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), - errmsg("tablespace location must not be inside the data directory"))); - /* * Check that location isn't too long. Remember that we're going to append * 'PG_XXX/<dboid>/<relid>_<fork>.<nnn>'. In the relative path case, we @@ -323,6 +317,12 @@ CreateTableSpace(CreateTableSpaceStmt *stmt) errmsg("tablespace location \"%s\" is too long", location))); + /* Reject tablespaces in the data directory. */ + if (is_in_data_directory(location)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("tablespace location must not be inside the data directory"))); + /* * Disallow creation of tablespaces named "pg_xxx"; we reserve this * namespace for system purposes. diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c index 30f6200a86..0216270dd3 100644 --- a/src/backend/storage/file/copydir.c +++ b/src/backend/storage/file/copydir.c @@ -22,11 +22,11 @@ #include <unistd.h> #include <sys/stat.h> +#include "access/xlogutils.h" #include "storage/copydir.h" #include "storage/fd.h" #include "miscadmin.h" #include "pgstat.h" - /* * copydir: copy a directory * @@ -41,10 +41,12 @@ copydir(char *fromdir, char *todir, bool recurse) char fromfile[MAXPGPATH * 2]; char tofile[MAXPGPATH * 2]; - if (MakePGDirectory(todir) != 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not create directory \"%s\": %m", todir))); + /* + * We might have to skip copydir to continue recovery. See the function + * for details. + */ + if (XLogMakePGDirectory(todir) != 0) + return; xldir = AllocateDir(fromdir); diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h index 0ab5ba62f5..46a7596315 100644 --- a/src/include/access/xlogutils.h +++ b/src/include/access/xlogutils.h @@ -43,6 +43,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record, extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, BlockNumber blkno, ReadBufferMode mode); +extern int XLogMakePGDirectory(const char *directoryName); extern Relation CreateFakeRelcacheEntry(RelFileNode rnode); extern void FreeFakeRelcacheEntry(Relation fakerel); -- 2.16.3