On Thu, Nov 29, 2018 at 03:00:42PM +0000, Bossart, Nathan wrote: > +1 Okay, here is an updated patch for this stuff, which does the following: - Check for a WAL segment if it has a ".ready" status file, an orphaned status file is removed only on ENOENT. - If durable_unlink fails, retry 3 times. If there are too many failures, the archiver gives up on the orphan status file removal. If the removal works correctly, the archiver moves on to the next file.
(The variable names could be better.) -- Michael
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 844b9d1b0e..0a78003172 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -28,6 +28,7 @@ #include <fcntl.h> #include <signal.h> #include <time.h> +#include <sys/stat.h> #include <sys/time.h> #include <sys/wait.h> #include <unistd.h> @@ -60,6 +61,7 @@ * failed archiver; in seconds. */ #define NUM_ARCHIVE_RETRIES 3 +#define NUM_STATUS_CLEANUP_RETRIES 3 /* ---------- @@ -424,9 +426,13 @@ pgarch_ArchiverCopyLoop(void) while (pgarch_readyXlog(xlog)) { int failures = 0; + int failures_unlink = 0; for (;;) { + struct stat stat_buf; + char pathname[MAXPGPATH]; + /* * Do not initiate any more archive commands after receiving * SIGTERM, nor after the postmaster has died unexpectedly. The @@ -456,6 +462,44 @@ pgarch_ArchiverCopyLoop(void) return; } + /* + * In the event of a system crash, archive status files may be + * left behind as their removals are not durable, cleaning up + * orphan entries here is the cheapest method. So check that + * the segment trying to be archived still exists. If it does + * not, its orphan status file is removed. + */ + snprintf(pathname, MAXPGPATH, XLOGDIR "/%s", xlog); + if (stat(pathname, &stat_buf) != 0 && errno == ENOENT) + { + char xlogready[MAXPGPATH]; + + StatusFilePath(xlogready, xlog, ".ready"); + if (durable_unlink(xlogready, WARNING) == 0) + { + ereport(WARNING, + (errmsg("removed orphan archive status file %s", + xlogready))); + + /* leave loop and move to the next status file */ + break; + } + + if (++failures_unlink >= NUM_STATUS_CLEANUP_RETRIES) + { + ereport(WARNING, + (errmsg("failed removal of \"%s\" too many times, will try again later", + xlogready))); + + /* give up cleanup of orphan status files */ + return; + } + + /* wait a bit before retrying */ + pg_usleep(1000000L); + continue; + } + if (pgarch_archiveXlog(xlog)) { /* successful */
signature.asc
Description: PGP signature