On Thu, Jan 19, 2023 at 6:20 AM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Tue, Jan 17, 2023 at 07:44:52PM +0530, Bharath Rupireddy wrote: > > On Thu, Jan 12, 2023 at 6:21 AM Nathan Bossart <nathandboss...@gmail.com> > > wrote: > >> With your patch, we might replay one of these "old" files in pg_wal instead > >> of the complete version of the file from the archives, > > > > That's true even today, without the patch, no? We're not changing the > > existing behaviour of the state machine. Can you explain how it > > happens with the patch? > > My point is that on HEAD, we will always prefer a complete archive file. > With your patch, we might instead choose to replay an old file in pg_wal > because we are artificially advancing the state machine. IOW even if > there's a complete archive available, we might not use it. This is a > behavior change, but I think it is okay.
Oh, yeah, I too agree that it's okay because manually copying WAL files directly to pg_wal (which eventually get replayed before switching to streaming) isn't recommended anyway for production level servers. I think, we covered it in the documentation that it exhausts all the WAL present in pg_wal before switching. Isn't that enough? + Specifies amount of time after which standby attempts to switch WAL + source from WAL archive to streaming replication (get WAL from + primary). However, exhaust all the WAL present in pg_wal before + switching. If the standby fails to switch to stream mode, it falls + back to archive mode. > >> Would you mind testing this scenario? ndby should receive f6 via archive and replay it (check the > > replay lsn an> > > > I meant testing the scenario where there's an old file in pg_wal, a > complete file in the archives, and your new GUC forces replay of the > former. This might be difficult to do in a TAP test. Ultimately, I just > want to validate the assumptions discussed above. I think testing the scenario [1] is achievable. I could write a TAP test for it - https://github.com/BRupireddy/postgres/tree/prefer_archived_wal_v1. It's a bit flaky and needs a little more work (1 - writing a custom script for restore_command that sleeps only after fetching an existing WAL file from archive, not sleeping for a history file or a non-existent WAL file. 2- finding a command-line way to sleep on Windows.) to stabilize it, but it seems doable. I can spend some more time, if one thinks that the test is worth adding to the core, perhaps discussing it separately from this thread. [1] RestoreArchivedFile(): /* * When doing archive recovery, we always prefer an archived log file even * if a file of the same name exists in XLOGDIR. The reason is that the * file in XLOGDIR could be an old, un-filled or partly-filled version * that was copied and restored as part of backing up $PGDATA. * -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com