At Tue, 01 Jun 2021 13:03:22 +0900, Tatsuro Yamada <tatsuro.yamada...@nttcom.co.jp> wrote in > Hi Horiguchi-san, > > On 2021/05/31 16:58, Kyotaro Horiguchi wrote: > > So, I started a thread for this topic diverged from the following > > thread. > > https://www.postgresql.org/message-id/4698027d-5c0d-098f-9a8e-8cf09e36a...@nttcom.co.jp_1 > > > >> So, what should we do for the user? I think we should put some notes > >> in postgresql.conf or in the documentation. For example, something > >> like this: > > I'm not sure about the exact configuration you have in mind, but that > > would happen on the cascaded standby in the case where the upstream > > promotes. In this case, the history file for the new timeline is > > archived twice. walreceiver triggers archiving of the new history > > file at the time of the promotion, then startup does the same when it > > restores the file from archive. Is it what you complained about? > > > Thank you for creating a new thread and explaining this. > We are not using cascade replication in our environment, but I think > the situation is similar. As an overview, when I do a promote, > the archive_command fails due to the history file.
Ah, I remembered that PG-REX starts a primary as a standby then promotes it. > I've created a reproduction script that includes building replication, > and I'll share it with you. (I used Robert's test.sh as a reference > for creating the reproduction script. Thanks) > > The scenario (sr_test_historyfile.sh) is as follows. > > #1 Start pgprimary as a main > #2 Create standby > #3 Start pgstandby as a standby > #4 Execute archive command > #5 Shutdown pgprimary > #6 Start pgprimary as a standby > #7 Promote pgprimary > #8 Execute archive_command again, but failed since duplicate history > file exists (see pgstandby.log) Ok, I clearly understood what you meant. (However, it is not the legit state where a standby is running without the primary is running..) Anyway the "test ! -f" can be problematic in the case. > Note that this may not be appropriate if you consider it as a recovery > procedure for replication configuration. However, I'm sharing it as it > is > because this seems to be the procedure used in the customer's > environment (PG-REX). Understood. > Regarding "test ! -f", > I am wondering how many people are using the test command for > archive_command. If I remember correctly, the guide provided by > NTT OSS Center that we are using does not recommend using the test > command. I think, as the PG-REX documentation says, the simple cp works well as far as the assumption of PG-REX - no double failure happenes, and following the instruction - holds. On the other hand, I found that the behavior happens more generally. If a standby with archive_mode=always craches, it starts recovery from the last checkpoint. If the checkpoint were in a archived segment, the restarted standby will fetch the already-archived segment from archive then fails to archive it. (The attached). So, your fear stated upthread is applicable for wider situations. The following suggestion is rather harmful for the archive_mode=always setting. https://www.postgresql.org/docs/14/continuous-archiving.html > The archive command should generally be designed to refuse to > overwrite any pre-existing archive file. This is an important safety > feature to preserve the integrity of your archive in case of > administrator error (such as sending the output of two different > servers to the same archive directory). I'm not sure how we should treat this.. Since archive must store files actually applied to the server data, just being already archived cannot be the reason for omitting archiving. We need to make sure the new file is byte-identical to the already-archived version. We could compare just *restored* file to the same file in pg_wal but it might be too much of penalty for for the benefit. (Attached second file.) Otherwise the documentation would need someting like the following if we assume the current behavior. > The archive command should generally be designed to refuse to > overwrite any pre-existing archive file. This is an important safety > feature to preserve the integrity of your archive in case of > administrator error (such as sending the output of two different > servers to the same archive directory). + For standby with the setting archive_mode=always, there's a case where + the same file is archived more than once. For safety, it is + recommended that when the destination file exists, the archive_command + returns zero if it is byte-identical to the source file. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
# Copyright (c) 2021, PostgreSQL Global Development Group # # Tests related to WAL archiving and recovery. # use strict; use warnings; use PostgresNode; use TestLib; use Test::More tests => 1; use Config; my $backup_name='mybackup'; my $primary = get_new_node('primary'); $primary->init( has_archiving => 1, allows_streaming => 1); $primary->append_conf('postgresql.conf', qq[ wal_keep_size=128MB archive_mode=always log_checkpoints=yes ]); my $primary_archive = $primary->archive_dir; $primary->start; $primary->backup($backup_name); my $standby = get_new_node('standby'); my $standby_archive = $standby->archive_dir; $standby->init_from_backup($primary, $backup_name, has_streaming=>1); $standby->append_conf('postgresql.conf', qq[ restore_command='cp $primary_archive/%f %p' archive_command='test ! -f $standby_archive/%f && cp %p $standby_archive/%f' ]); $standby->start; $primary->psql('postgres', 'CHECKPOINT;SELECT pg_switch_wal();CREATE TABLE t(); pg_switch_wal();'); $standby->psql('postgres', 'CHECKPOINT'); $standby->stop('immediate'); $standby->start; $primary->psql('postgres', 'CHECKPOINT;SELECT pg_switch_wal();CHECKPOINT'); $standby->psql('postgres', 'CHECKPOINT'); my $result; while (1) { $result = $standby->safe_psql('postgres', "SELECT last_archived_wal, last_failed_wal FROM pg_stat_archiver"); sleep(0.1); last if ($result ne "|"); } ok($result =~ /^[^|]+\|$/, 'archive check 1');
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 26b023e754..037da0aa3d 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -382,6 +382,7 @@ KeepFileRestoredFromArchive(const char *path, const char *xlogfname) { char xlogfpath[MAXPGPATH]; bool reload = false; + bool skip_archive = false; struct stat statbuf; snprintf(xlogfpath, MAXPGPATH, XLOGDIR "/%s", xlogfname); @@ -416,6 +417,56 @@ KeepFileRestoredFromArchive(const char *path, const char *xlogfname) /* same-size buffers, so this never truncates */ strlcpy(oldpath, xlogfpath, MAXPGPATH); #endif + /* + * On a standby with archive_mode=always, there's the case where the + * same file is archived more than once. If the archive_command rejects + * overwriting, WAL-archiving won't go further than the file forever. + * Avoid duplicate archiving attempts when the file is known to have + * been archived and the content doesn't change. + */ + if (XLogArchiveMode == ARCHIVE_MODE_ALWAYS && + XLogArchiveCheckDone(xlogfname)) + { + unsigned char srcbuf[XLOG_BLCKSZ]; + unsigned char dstbuf[XLOG_BLCKSZ]; + int fd1 = BasicOpenFile(path, O_RDONLY | PG_BINARY); + int fd2 = BasicOpenFile(oldpath, O_RDONLY | PG_BINARY); + uint32 i; + uint32 off = 0; + + /* + * Compare the two files' contents. We don't bother completing if + * something's wrong meanwhile. + */ + for (i = 0 ; i < wal_segment_size / XLOG_BLCKSZ ; i++) + { + if (pg_pread(fd1, srcbuf, XLOG_BLCKSZ, (off_t) off) + != XLOG_BLCKSZ) + break; + + if (pg_pread(fd2, dstbuf, XLOG_BLCKSZ, (off_t) off) + != XLOG_BLCKSZ) + break; + + if (memcmp(srcbuf, dstbuf, XLOG_BLCKSZ) != 0) + break; + + off += XLOG_BLCKSZ; + } + + close(fd1); + close(fd2); + + if (i == wal_segment_size / XLOG_BLCKSZ) + { + skip_archive = true; + + ereport(LOG, + (errmsg ("log file \"%s\" have been already archived, skip archiving", + xlogfname))); + } + } + if (unlink(oldpath) != 0) ereport(FATAL, (errcode_for_file_access(), @@ -430,7 +481,7 @@ KeepFileRestoredFromArchive(const char *path, const char *xlogfname) * Create .done file forcibly to prevent the restored segment from being * archived again later. */ - if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS) + if (XLogArchiveMode != ARCHIVE_MODE_ALWAYS || skip_archive) XLogArchiveForceDone(xlogfname); else XLogArchiveNotify(xlogfname);