On Fri, Nov 22, 2019 at 05:31:55AM +0000, matsumura....@fujitsu.com wrote:
Hi all
I find a situation that WAL archive file is lost but any WAL segment file is
not lost.
It causes for archive recovery to fail. Is this behavior a bug?
example:
WAL segment files
000000010000000000000001
000000010000000000000002
000000010000000000000003
Archive files
000000010000000000000001
000000010000000000000003
Archive file 000000010000000000000002 is lost but WAL segment files
is continuous. Recovery with archive (i.e. PITR) stops at the end of
000000010000000000000001.
How to reproduce:
- Set up replication (primary and standby).
- Set [archive_mode = always] in standby.
- WAL receiver exits (i.e. because primary goes down)
after receiver inserts the last record in some WAL segment file
before receiver notifies the segement file to archiver(create .ready file).
Even if WAL receiver restarts, the WAL segment file is not notified to
archiver.
That does indeed seem like a bug. We should certainly archive all WAL
segments, irrespectedly of primary shutdowns/restarts/whatever. I guess
we should make sure the archiver is properly notified befor ethe exit.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services