Hi Horiguchi-san,

(To recap: In a replication set using archive, startup tries to
restore WAL files from archive before checking pg_wal directory for
the desired file.  The behavior itself is intentionally designed and
reasonable. However, the restore code notifies of a restored file
regardless of whether it has been already archived or not.  If
archive_command is written so as to return error for overwriting as we
suggest in the documentation, that behavior causes archive failure.)

After playing with this, I see the problem just by restarting a
standby even in a simple archive-replication set after making
not-special prerequisites.  So I think this is worth fixing.

With this patch, KeepFileRestoredFromArchive compares the contents of
just-restored file and the existing file for the same segment only
when:

      - archive_mode = always
  and - the file to restore already exists in pgwal
  and - it has a .done and/or .ready status file.

which doesn't happen usually.  Then the function skips archive
notification if the contents are identical.  The included TAP test is
working both on Linux and Windows.


Thank you for the analysis and the patch.
I'll try the patch tomorrow.

I just noticed that this thread is still tied to another thread
(it's not an independent thread). To fix that, it may be better to
create a new thread again.


Regards,
Tatsuro Yamada




Reply via email to