Oops! The comment in the previous patch is wrong. At Mon, 22 Apr 2019 16:15:13 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote in <20190422.161513.258021727.horiguchi.kyot...@lab.ntt.co.jp> > At Mon, 22 Apr 2019 12:36:43 +0800, Paul Guo <p...@pivotal.io> wrote in > <CAEET0ZGpUrMGUzfyzVF9FuSq+zb=qovya2cvyrndotvz5vx...@mail.gmail.com> > > Please see my replies inline. Thanks. > > > > On Fri, Apr 19, 2019 at 12:38 PM Asim R P <aprav...@pivotal.io> wrote: > > > > > On Wed, Apr 17, 2019 at 1:27 PM Paul Guo <p...@pivotal.io> wrote: > > > > > > > > create db with tablespace > > > > drop database > > > > drop tablespace. > > > > > > Essentially, that sequence of operations causes crash recovery to fail > > > if the "drop tablespace" transaction was committed before crashing. > > > This is a bug in crash recovery in general and should be reproducible > > > without configuring a standby. Is that right? > > > > > > > No. In general, checkpoint is done for drop_db/create_db/drop_tablespace on > > master. > > That makes the file/directory update-to-date if I understand the related > > code correctly. > > For standby, checkpoint redo does not ensure that. > > That's right partly. As you must have seen, fast shutdown forces > restartpoint for the last checkpoint and it prevents the problem > from happening. Anyway it seems to be a problem. > > > > Your patch creates missing directories in the destination. Don't we > > > need to create the tablespace symlink under pg_tblspc/? I would > > > > > > > 'create db with tablespace' redo log does not include the tablespace real > > directory information. > > Yes, we could add in it into the xlog, but that seems to be an overdesign. > > But I don't think creating directory that is to be removed just > after is a wanted solution. The directory most likely to be be > removed just after. > > > > prefer extending the invalid page mechanism to deal with this, as > > > suggested by Ashwin off-list. It will allow us to avoid creating > > > > directories and files only to remove them shortly afterwards when the > > > drop database and drop tablespace records are replayed. > > > > > > > > 'invalid page' mechanism seems to be more proper for missing pages of a > > file. For > > missing directories, we could, of course, hack to use that (e.g. reading > > any page of > > a relfile in that database) to make sure the tablespace create code > > (without symlink) > > safer (It assumes those directories will be deleted soon). > > > > More feedback about all of the previous discussed solutions is welcome. > > It doesn't seem to me that the invalid page mechanism is > applicable in straightforward way, because it doesn't consider > simple file copy. > > Drop failure is ignored any time. I suppose we can ignore the > error to continue recovering as far as recovery have not reached > consistency. The attached would work *at least* your case, but I > haven't checked this covers all places where need the same > treatment.
The comment for the new function XLogMakePGDirectory is wrong: + * There is a possibility that WAL replay causes a creation of the same + * directory left by the previous crash. Issuing ERROR prevents the caller + * from continuing recovery. The correct one is: + * There is a possibility that WAL replay causes an error by creation of a + * directory under a directory removed before the previous crash. Issuing + * ERROR prevents the caller from continuing recovery. It is fixed in the attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 10a663bae6..01331f0da9 100644 --- a/src/backend/access/transam/xlogutils.c +++ b/src/backend/access/transam/xlogutils.c @@ -522,6 +522,44 @@ XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, return buffer; } +/* + * XLogMakePGDirectory + * + * There is a possibility that WAL replay causes an error by creation of a + * directory under a directory removed before the previous crash. Issuing + * ERROR prevents the caller from continuing recovery. + * + * To prevent that case, this function issues WARNING instead of ERROR on + * error if consistency is not reached yet. + */ +int +XLogMakePGDirectory(const char *directoryName) +{ + int ret; + + ret = MakePGDirectory(directoryName); + + if (ret != 0) + { + int elevel = ERROR; + + /* + * We might get error trying to create existing directory that is to + * be removed just after. Don't issue ERROR in the case. Recovery + * will stop if we again failed after reaching consistency. + */ + if (InRecovery && !reachedConsistency) + elevel = WARNING; + + ereport(elevel, + (errcode_for_file_access(), + errmsg("could not create directory \"%s\": %m", directoryName))); + return ret; + } + + return 0; +} + /* * Struct actually returned by XLogFakeRelcacheEntry, though the declared * return type is Relation. diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c index 30f6200a86..0216270dd3 100644 --- a/src/backend/storage/file/copydir.c +++ b/src/backend/storage/file/copydir.c @@ -22,11 +22,11 @@ #include <unistd.h> #include <sys/stat.h> +#include "access/xlogutils.h" #include "storage/copydir.h" #include "storage/fd.h" #include "miscadmin.h" #include "pgstat.h" - /* * copydir: copy a directory * @@ -41,10 +41,12 @@ copydir(char *fromdir, char *todir, bool recurse) char fromfile[MAXPGPATH * 2]; char tofile[MAXPGPATH * 2]; - if (MakePGDirectory(todir) != 0) - ereport(ERROR, - (errcode_for_file_access(), - errmsg("could not create directory \"%s\": %m", todir))); + /* + * We might have to skip copydir to continue recovery. See the function + * for details. + */ + if (XLogMakePGDirectory(todir) != 0) + return; xldir = AllocateDir(fromdir); diff --git a/src/include/access/xlogutils.h b/src/include/access/xlogutils.h index 0ab5ba62f5..46a7596315 100644 --- a/src/include/access/xlogutils.h +++ b/src/include/access/xlogutils.h @@ -43,6 +43,7 @@ extern XLogRedoAction XLogReadBufferForRedoExtended(XLogReaderState *record, extern Buffer XLogReadBufferExtended(RelFileNode rnode, ForkNumber forknum, BlockNumber blkno, ReadBufferMode mode); +extern int XLogMakePGDirectory(const char *directoryName); extern Relation CreateFakeRelcacheEntry(RelFileNode rnode); extern void FreeFakeRelcacheEntry(Relation fakerel);