Hi, Thanks for the feedback.
> I wonder if this can be simplified even further. If we don't bother > trying to catch out-of-order .ready files in XLogArchiveNotify() and > just depend on the per-checkpoint/restartpoint directory scans, we can > probably remove lastReadySegNo from archiver state completely. If we agree that some extra delay in archiving these files is acceptable then we don't require any special handling for this scenario otherwise we may need to handle it separately. > + /* Initialize the current state of archiver */ > + xlogState.lastSegNo = MaxXLogSegNo; > + xlogState.lastTli = MaxTimeLineID; > > It looks like we have two ways to force a directory scan. We can > either set force_dir_scan to true, or lastSegNo can be set to > MaxXLogSegNo. Why not just set force_dir_scan to true here so that we > only have one way to force a directory scan? make sense, I have updated it. > Don't we also need to force a directory scan in the other cases we > return early from pgarch_ArchiverCopyLoop()? We will have already > advanced the archiver state in pgarch_readyXlog(), so I think we'd end > up skipping files if we didn't. For example, if archive_command isn't > set, we'll just return, and the next call to pgarch_readyXlog() might > return the next file. I agree, we should do it for all early return paths. > nitpick: I think we should just call PgArchForceDirScan() here. Yes, that's right. > > > This is an interesting idea, but the "else" block here seems prone to > > > race conditions. I think we'd have to hold arch_lck to prevent that. > > > But as I mentioned above, if we are okay with depending on the > > > fallback directory scans, I think we can remove the "else" block > > > completely. Ohh I didn't realize the race condition here. The competing processes can read the same value of lastReadySegNo. > > Thinking further, we probably need to hold a lock even when we are > > creating the .ready file to avoid race conditions. > > The race condition surely happens, but even if that happens, all > competing processes except one of them detect out-of-order and will > enforce directory scan. But I'm not sure how it behaves under more > complex situation so I'm not sure I like that behavior. > > We could just use another lock for the logic there, but instead > couldn't we merge PgArchGetLastReadySegNo and PgArchSetLastReadySegNo > into one atomic test-and-(check-and-)set function? Like this. I agree that we can merge the existing "Get" and "Set" functions into an atomic test-and-check-and-set function to avoid a race condition. I have incorporated these changes and updated a new patch. PFA patch. Thanks, Dipesh
From f8983c7b80ff0f8cbc415d4d9a3ad4992925e775 Mon Sep 17 00:00:00 2001 From: Dipesh Pandit <dipesh.pan...@enterprisedb.com> Date: Wed, 8 Sep 2021 21:47:16 +0530 Subject: [PATCH] keep trying the next file approach --- src/backend/access/transam/xlog.c | 20 +++ src/backend/access/transam/xlogarchive.c | 16 ++ src/backend/postmaster/pgarch.c | 249 +++++++++++++++++++++++++++---- src/include/access/xlogdefs.h | 2 + src/include/postmaster/pgarch.h | 4 + 5 files changed, 259 insertions(+), 32 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e51a7a7..7dd4b96 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -9285,6 +9285,16 @@ CreateCheckPoint(int flags) RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr); /* + * Force the archiver to perform a directory scan. + * + * Ordinarily, this should not be needed, but it seems like a good idea + * to make sure we scan the archive_status directory every once in a + * while to make sure we haven't left anything behind. Calling it here + * ensures we do a directory scan at least once per checkpoint. + */ + PgArchForceDirScan(); + + /* * Make more log segments if needed. (Do this after recycling old log * segments, since that may supply some of the needed files.) */ @@ -9650,6 +9660,16 @@ CreateRestartPoint(int flags) RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr); /* + * Force the archiver to perform a directory scan. + * + * Ordinarily, this should not be needed, but it seems like a good idea + * to make sure we scan the archive_status directory every once in a + * while to make sure we haven't left anything behind. Calling it here + * ensures we do a directory scan at least once per restartpoint. + */ + PgArchForceDirScan(); + + /* * Make more log segments if needed. (Do this after recycling old log * segments, since that may supply some of the needed files.) */ diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 26b023e..110bee4 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -489,6 +489,22 @@ XLogArchiveNotify(const char *xlog) return; } + /* Force a directory scan if we are archiving anything but a regular + * WAL file or if this WAL file is being created out-of-order. + */ + if (!IsXLogFileName(xlog)) + PgArchForceDirScan(); + else + { + TimeLineID tli; + XLogSegNo this_segno; + + XLogFromFileName(xlog, &tli, &this_segno, wal_segment_size); + + if (!PgArchIsReadySegInOrder(this_segno)) + PgArchForceDirScan(); + } + /* Notify archiver that it's got something to do */ if (IsUnderPostmaster) PgArchWakeup(); diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 74a7d7c..3168409 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -47,6 +47,7 @@ #include "storage/proc.h" #include "storage/procsignal.h" #include "storage/shmem.h" +#include "storage/spin.h" #include "utils/guc.h" #include "utils/ps_status.h" @@ -76,6 +77,20 @@ typedef struct PgArchData { int pgprocno; /* pgprocno of archiver process */ + + /* + * Forces a directory scan in pgarch_readyXlog(). Protected by + * arch_lck. + */ + bool force_dir_scan; + + /* + * Segment number of last .ready file created by backend, protected by + * arch_lck. + */ + XLogSegNo lastReadySegNo; + + slock_t arch_lck; } PgArchData; @@ -87,6 +102,16 @@ static time_t last_sigterm_time = 0; static PgArchData *PgArch = NULL; /* + * Current archiver state, the segment number and timeline ID corresponding to + * last WAL file found by archiver. + */ +typedef struct ArchXLogState +{ + XLogSegNo lastSegNo; + TimeLineID lastTli; +} ArchXLogState; + +/* * Flags set by interrupt handlers for later service in the main loop. */ static volatile sig_atomic_t ready_to_stop = false; @@ -97,12 +122,13 @@ static volatile sig_atomic_t ready_to_stop = false; */ static void pgarch_waken_stop(SIGNAL_ARGS); static void pgarch_MainLoop(void); -static void pgarch_ArchiverCopyLoop(void); +static void pgarch_ArchiverCopyLoop(ArchXLogState *xlogState); static bool pgarch_archiveXlog(char *xlog); -static bool pgarch_readyXlog(char *xlog); +static bool pgarch_readyXlog(char *xlog, ArchXLogState *xlogState); static void pgarch_archiveDone(char *xlog); static void pgarch_die(int code, Datum arg); static void HandlePgArchInterrupts(void); +static bool higher_arch_priority(const char *a, const char *b); /* Report shared memory space needed by PgArchShmemInit */ Size @@ -129,6 +155,8 @@ PgArchShmemInit(void) /* First time through, so initialize */ MemSet(PgArch, 0, PgArchShmemSize()); PgArch->pgprocno = INVALID_PGPROCNO; + PgArch->lastReadySegNo = MaxXLogSegNo; + SpinLockInit(&PgArch->arch_lck); } } @@ -245,6 +273,14 @@ pgarch_MainLoop(void) { pg_time_t last_copy_time = 0; bool time_to_stop; + ArchXLogState xlogState; + + /* Initialize the current state of archiver */ + xlogState.lastSegNo = MaxXLogSegNo; + xlogState.lastTli = MaxTimeLineID; + + /* Force a directory scan for the first cycle after startup */ + PgArchForceDirScan(); /* * There shouldn't be anything for the archiver to do except to wait for a @@ -280,7 +316,7 @@ pgarch_MainLoop(void) } /* Do what we're here for */ - pgarch_ArchiverCopyLoop(); + pgarch_ArchiverCopyLoop(&xlogState); last_copy_time = time(NULL); /* @@ -321,7 +357,7 @@ pgarch_MainLoop(void) * Archives all outstanding xlogs then returns */ static void -pgarch_ArchiverCopyLoop(void) +pgarch_ArchiverCopyLoop(ArchXLogState *xlogState) { char xlog[MAX_XFN_CHARS + 1]; @@ -331,7 +367,7 @@ pgarch_ArchiverCopyLoop(void) * some backend will add files onto the list of those that need archiving * while we are still copying earlier archives */ - while (pgarch_readyXlog(xlog)) + while (pgarch_readyXlog(xlog, xlogState)) { int failures = 0; int failures_orphan = 0; @@ -363,6 +399,12 @@ pgarch_ArchiverCopyLoop(void) { ereport(WARNING, (errmsg("archive_mode enabled, yet archive_command is not set"))); + + /* + * archive_command is not yet set, perform a full directory + * scan in next cycle. + */ + PgArchForceDirScan(); return; } @@ -397,6 +439,12 @@ pgarch_ArchiverCopyLoop(void) (errmsg("removal of orphan archive status file \"%s\" failed too many times, will try again later", xlogready))); + /* + * Failed to remove an orphan .ready file, perform a full + * directory scan in next cycle. + */ + PgArchForceDirScan(); + /* give up cleanup of orphan status files */ return; } @@ -432,6 +480,13 @@ pgarch_ArchiverCopyLoop(void) ereport(WARNING, (errmsg("archiving write-ahead log file \"%s\" failed too many times, will try again later", xlog))); + /* + * Failed to archive, make sure that archiver performs a + * full directory scan in the next cycle to avoid missing + * the WAL file which could not be archived due to some + * failure in current cycle. + */ + PgArchForceDirScan(); return; /* give up archiving for now */ } pg_usleep(1000000L); /* wait a bit before retrying */ @@ -596,30 +651,84 @@ pgarch_archiveXlog(char *xlog) * larger ID; the net result being that past timelines are given higher * priority for archiving. This seems okay, or at least not obviously worth * changing. + * + * WAL files are generated in a specific order of log segment number. The + * directory scan for each WAL file can be minimised by identifying the next + * WAL file in the sequence. This can be achieved by maintaining log segment + * number and timeline ID corresponding to WAL file currently being archived. + * The log segment number of current WAL file can be incremented by '1' to + * point to the next WAL file in a sequence. Full directory scan can be avoided + * by checking the availability of next WAL file. "xlogState" specifies the + * segment number and timeline ID corresponding to the most recent WAL file + * successfully archived. + * + * However, a full directory scan is performed in some special cases where it + * requires us to archive files which takes precedence over the next anticipated + * log segment. For example, history file takes precedence over archiving WAL + * files on older timeline or an older WAL file which is being left out because + * corresponding .ready file is created out of order or archiving a backup + * history file created during backup. + * + * Returns "true" if a segment is ready for archival, "xlog" represents the + * name of the segment. */ static bool -pgarch_readyXlog(char *xlog) +pgarch_readyXlog(char *xlog, ArchXLogState *xlogState) { - /* - * open xlog status directory and read through list of xlogs that have the - * .ready suffix, looking for earliest file. It is possible to optimise - * this code, though only a single file is expected on the vast majority - * of calls, so.... - */ + char basename[MAX_XFN_CHARS + 1]; char XLogArchiveStatusDir[MAXPGPATH]; DIR *rldir; struct dirent *rlde; bool found = false; - bool historyFound = false; + bool force_dir_scan; + TimeLineID lastTli; + XLogSegNo lastSegNo; + + /* Obtain current archiver state and reset force_dir_scan. */ + SpinLockAcquire(&PgArch->arch_lck); + force_dir_scan = PgArch->force_dir_scan; + PgArch->force_dir_scan = false; + SpinLockRelease(&PgArch->arch_lck); + + lastSegNo = xlogState->lastSegNo; + lastTli = xlogState->lastTli; + + /* Try to skip the directory scan if possible. */ + if (!force_dir_scan) + { + struct stat st; + char readyfile[MAXPGPATH]; + + Assert(lastTli != MaxTimeLineID); + Assert(lastSegNo != MaxXLogSegNo); + + lastSegNo++; + XLogFileName(basename, lastTli, lastSegNo, wal_segment_size); + + StatusFilePath(readyfile, basename, ".ready"); + if (stat(readyfile, &st) == 0) + { + strcpy(xlog, basename); + xlogState->lastSegNo++; + return true; + } + else if (errno != ENOENT) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not stat \"%s\": %m", readyfile))); + } + + /* + * Open the archive status directory and read through the list of files + * with the .ready suffix, looking for the earliest file. + */ snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status"); rldir = AllocateDir(XLogArchiveStatusDir); while ((rlde = ReadDir(rldir, XLogArchiveStatusDir)) != NULL) { int basenamelen = (int) strlen(rlde->d_name) - 6; - char basename[MAX_XFN_CHARS + 1]; - bool ishistory; /* Ignore entries with unexpected number of characters */ if (basenamelen < MIN_XFN_CHARS || @@ -638,35 +747,76 @@ pgarch_readyXlog(char *xlog) memcpy(basename, rlde->d_name, basenamelen); basename[basenamelen] = '\0'; - /* Is this a history file? */ - ishistory = IsTLHistoryFileName(basename); - - /* - * Consume the file to archive. History files have the highest - * priority. If this is the first file or the first history file - * ever, copy it. In the presence of a history file already chosen as - * target, ignore all other files except history files which have been - * generated for an older timeline than what is already chosen as - * target to archive. - */ - if (!found || (ishistory && !historyFound)) + if (!found || higher_arch_priority(basename, xlog)) { strcpy(xlog, basename); found = true; - historyFound = ishistory; } - else if (ishistory || !historyFound) + } + FreeDir(rldir); + + if (found) + { + if (IsXLogFileName(xlog)) + XLogFromFileName(xlog, &xlogState->lastTli, &xlogState->lastSegNo, + wal_segment_size); + else { - if (strcmp(basename, xlog) < 0) - strcpy(xlog, basename); + /* Continue directory scan until we find a regular WAL file */ + PgArchForceDirScan(); } + + ereport(DEBUG3, + errmsg("directory scan to archive write ahead log file \"%s\"", xlog)); + } + else + { + /* + * Still waiting for the first file, force directory scan in next + * cycle. + */ + if (xlogState->lastSegNo == MaxXLogSegNo) + PgArchForceDirScan(); } - FreeDir(rldir); return found; } /* + * higher_arch_priority + * + * Compares archival priority of the two file names. If "a" has a higher + * priority than "b", true is returned. If "b" has a higher priority than + * "a" false is returned. + */ +static bool +higher_arch_priority(const char *a, const char *b) +{ + bool a_ishistory = IsTLHistoryFileName(a); + bool b_ishistory = IsTLHistoryFileName(b); + bool a_isbackuphistory = IsBackupHistoryFileName(a); + bool b_isbackuphistory = IsBackupHistoryFileName(b); + + /* + * Timeline history files have a higher priority than everything else. + * Backup history files are given the second highest priority so that + * the archiver picks them up when a directory scan is forced. + */ + if (a_ishistory || b_ishistory) + { + if (a_ishistory != b_ishistory) + return a_ishistory; + } + else if (a_isbackuphistory || b_isbackuphistory) + { + if (a_isbackuphistory != b_isbackuphistory) + return a_isbackuphistory; + } + + return (strcmp(a, b) < 0); +} + +/* * pgarch_archiveDone * * Emit notification that an xlog file has been successfully archived. @@ -716,3 +866,38 @@ HandlePgArchInterrupts(void) ProcessConfigFile(PGC_SIGHUP); } } + +/* + * PgArchForceDirScan + * + * When called, the next call to pgarch_readyXlog() will perform a + * directory scan. This is useful for ensuring that imporant files such + * as timeline history files are archived as quickly as possible. + */ +void +PgArchForceDirScan(void) +{ + SpinLockAcquire(&PgArch->arch_lck); + PgArch->force_dir_scan = true; + SpinLockRelease(&PgArch->arch_lck); +} + +/* + * PgArchIsReadySegInOrder + * + * Called by backend to check if current .ready file is created out of order. + */ +bool +PgArchIsReadySegInOrder(XLogSegNo this_segno) +{ + XLogSegNo last_segno, next_segno; + + SpinLockAcquire(&PgArch->arch_lck); + last_segno = PgArch->lastReadySegNo; + PgArch->lastReadySegNo = this_segno; + SpinLockRelease(&PgArch->arch_lck); + + next_segno = last_segno + 1; + + return (next_segno == this_segno) ? true : false; +} diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h index 60348d1..6122891 100644 --- a/src/include/access/xlogdefs.h +++ b/src/include/access/xlogdefs.h @@ -46,6 +46,7 @@ typedef uint64 XLogRecPtr; * XLogSegNo - physical log file sequence number. */ typedef uint64 XLogSegNo; +#define MaxXLogSegNo ((XLogSegNo) 0xFFFFFFFFFFFFFFFF) /* * TimeLineID (TLI) - identifies different database histories to prevent @@ -57,6 +58,7 @@ typedef uint64 XLogSegNo; * sequence that was generated in the previous incarnation. */ typedef uint32 TimeLineID; +#define MaxTimeLineID ((TimeLineID) 0xFFFFFFFF) /* * Replication origin id - this is located in this file to avoid having to diff --git a/src/include/postmaster/pgarch.h b/src/include/postmaster/pgarch.h index 1e47a14..6d6461f 100644 --- a/src/include/postmaster/pgarch.h +++ b/src/include/postmaster/pgarch.h @@ -13,6 +13,8 @@ #ifndef _PGARCH_H #define _PGARCH_H +#include "access/xlogdefs.h" + /* ---------- * Archiver control info. * @@ -31,5 +33,7 @@ extern void PgArchShmemInit(void); extern bool PgArchCanRestart(void); extern void PgArchiverMain(void) pg_attribute_noreturn(); extern void PgArchWakeup(void); +extern void PgArchForceDirScan(void); +extern bool PgArchIsReadySegInOrder(XLogSegNo this_segno); #endif /* _PGARCH_H */ -- 1.8.3.1