On Fri, Apr 08, 2022 at 10:38:03AM -0400, Robert Haas wrote: > I'd actually be in favor of nuking durable_rename_excl() from orbit > and putting the file-exists tests in the callers. Otherwise, someone > might assume that it actually has the semantics that its name > suggests, which could be pretty disastrous. If we don't want to do > that, then I'd changing to do the stat-then-durable-rename thing > internally, so we don't leave hard links lying around in *any* code > path. Perhaps that's the right answer for the back-branches in any > case, since there could be third-party code calling this function.
I've attached a patch that simply removes durable_rename_excl() and replaces existing calls with durable_rename(). I noticed that Andres expressed similar misgivings about durable_rename_excl() last year [0] [1]. I can create a stat-then-durable-rename version of this for back-patching if that is still the route we want to go. [0] https://postgr.es/me/20210318014812.ds2iz4jz5h7la6un%40alap3.anarazel.de [1] https://postgr.es/m/20210318023004.gz2aejhze2kkkqr2%40alap3.anarazel.de -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
>From d3c633e19555dc0cf98207ad5e7c08ab9ce85dc0 Mon Sep 17 00:00:00 2001 From: Nathan Bossart <nathandboss...@gmail.com> Date: Fri, 8 Apr 2022 11:48:17 -0700 Subject: [PATCH v2 1/1] Remove durable_rename_excl(). durable_rename_excl() attempts to avoid overwriting any existing files by using link() and unlink(), but it falls back to rename() on some platforms (e.g., Windows), which offers no such ovewrite protection. Most callers use durable_rename_excl() just in case there is an existing file, but in practice there shouldn't be one. basic_archive uses it to avoid overwriting an archive concurrently created by another server, but as mentioned above, it will still overwrite files on some platforms. Furthermore, failures during durable_rename_excl() can result in multiple hard links to the same file. My testing demonstrated that it was possible to end up with two links to the same file in pg_wal after a crash just before unlink() during WAL recycling. Specifically, the test produced links to the same file for the current WAL file and the next one because the half-recycled WAL file was re-recycled upon restarting. This seems likely to lead to WAL corruption. This change removes durable_rename_excl() and replaces all existing calls with durable_rename(). This removes the protection against accidentally overwriting an existing file, but some platforms are already living without it, and ordinarily there shouldn't be one. Author: Nathan Bossart Reviewed-by: Robert Haas Discussion: https://postgr.es/m/20220407182954.GA1231544%40nathanxps13 --- contrib/basic_archive/basic_archive.c | 5 ++- src/backend/access/transam/timeline.c | 14 +----- src/backend/access/transam/xlog.c | 8 +--- src/backend/storage/file/fd.c | 63 --------------------------- src/include/pg_config_manual.h | 7 --- src/include/storage/fd.h | 1 - 6 files changed, 7 insertions(+), 91 deletions(-) diff --git a/contrib/basic_archive/basic_archive.c b/contrib/basic_archive/basic_archive.c index e7efbfb9c3..ed33854c57 100644 --- a/contrib/basic_archive/basic_archive.c +++ b/contrib/basic_archive/basic_archive.c @@ -281,9 +281,10 @@ basic_archive_file_internal(const char *file, const char *path) /* * Sync the temporary file to disk and move it to its final destination. - * This will fail if destination already exists. + * Note that this will overwrite any existing file, but this is only + * possible if someone else created the file since the stat() above. */ - (void) durable_rename_excl(temp, destination, ERROR); + (void) durable_rename(temp, destination, ERROR); ereport(DEBUG1, (errmsg("archived \"%s\" via basic_archive", file))); diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c index be21968293..128f754e87 100644 --- a/src/backend/access/transam/timeline.c +++ b/src/backend/access/transam/timeline.c @@ -441,12 +441,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI, * Now move the completed history file into place with its final name. */ TLHistoryFilePath(path, newTLI); - - /* - * Perform the rename using link if available, paranoidly trying to avoid - * overwriting an existing file (there shouldn't be one). - */ - durable_rename_excl(tmppath, path, ERROR); + durable_rename(tmppath, path, ERROR); /* The history file can be archived immediately. */ if (XLogArchivingActive()) @@ -519,12 +514,7 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size) * Now move the completed history file into place with its final name. */ TLHistoryFilePath(path, tli); - - /* - * Perform the rename using link if available, paranoidly trying to avoid - * overwriting an existing file (there shouldn't be one). - */ - durable_rename_excl(tmppath, path, ERROR); + durable_rename(tmppath, path, ERROR); } /* diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index a7814d4019..d19215ab24 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3323,14 +3323,10 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath, } } - /* - * Perform the rename using link if available, paranoidly trying to avoid - * overwriting an existing file (there shouldn't be one). - */ - if (durable_rename_excl(tmppath, path, LOG) != 0) + if (durable_rename(tmppath, path, LOG) != 0) { LWLockRelease(ControlFileLock); - /* durable_rename_excl already emitted log message */ + /* durable_rename already emitted log message */ return false; } diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 14b77f2861..88645ed83d 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -807,69 +807,6 @@ durable_unlink(const char *fname, int elevel) return 0; } -/* - * durable_rename_excl -- rename a file in a durable manner. - * - * Similar to durable_rename(), except that this routine tries (but does not - * guarantee) not to overwrite the target file. - * - * Note that a crash in an unfortunate moment can leave you with two links to - * the target file. - * - * Log errors with the caller specified severity. - * - * On Windows, using a hard link followed by unlink() causes concurrency - * issues, while a simple rename() does not cause that, so be careful when - * changing the logic of this routine. - * - * Returns 0 if the operation succeeded, -1 otherwise. Note that errno is not - * valid upon return. - */ -int -durable_rename_excl(const char *oldfile, const char *newfile, int elevel) -{ - /* - * Ensure that, if we crash directly after the rename/link, a file with - * valid contents is moved into place. - */ - if (fsync_fname_ext(oldfile, false, false, elevel) != 0) - return -1; - -#ifdef HAVE_WORKING_LINK - if (link(oldfile, newfile) < 0) - { - ereport(elevel, - (errcode_for_file_access(), - errmsg("could not link file \"%s\" to \"%s\": %m", - oldfile, newfile))); - return -1; - } - unlink(oldfile); -#else - if (rename(oldfile, newfile) < 0) - { - ereport(elevel, - (errcode_for_file_access(), - errmsg("could not rename file \"%s\" to \"%s\": %m", - oldfile, newfile))); - return -1; - } -#endif - - /* - * Make change persistent in case of an OS crash, both the new entry and - * its parent directory need to be flushed. - */ - if (fsync_fname_ext(newfile, false, false, elevel) != 0) - return -1; - - /* Same for parent directory */ - if (fsync_parent_path(newfile, elevel) != 0) - return -1; - - return 0; -} - /* * InitFileAccess --- initialize this module during backend startup * diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h index 84ce5a4a5d..830804fdfb 100644 --- a/src/include/pg_config_manual.h +++ b/src/include/pg_config_manual.h @@ -163,13 +163,6 @@ #define USE_BARRIER_SMGRRELEASE #endif -/* - * Define this if your operating system supports link() - */ -#if !defined(WIN32) && !defined(__CYGWIN__) -#define HAVE_WORKING_LINK 1 -#endif - /* * USE_POSIX_FADVISE controls whether Postgres will attempt to use the * posix_fadvise() kernel call. Usually the automatic configure tests are diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h index 69549b000f..2b4a8e0ffe 100644 --- a/src/include/storage/fd.h +++ b/src/include/storage/fd.h @@ -187,7 +187,6 @@ extern void fsync_fname(const char *fname, bool isdir); extern int fsync_fname_ext(const char *fname, bool isdir, bool ignore_perm, int elevel); extern int durable_rename(const char *oldfile, const char *newfile, int loglevel); extern int durable_unlink(const char *fname, int loglevel); -extern int durable_rename_excl(const char *oldfile, const char *newfile, int loglevel); extern void SyncDataDirectory(void); extern int data_sync_elevel(int elevel); -- 2.25.1