On Wed, Sep 2, 2020 at 2:18 AM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > On Wed, Sep 02, 2020 at 02:05:10AM +1200, Thomas Munro wrote: > >On Wed, Sep 2, 2020 at 1:14 AM Tomas Vondra > ><tomas.von...@2ndquadrant.com> wrote: > >> from the archive > > > >Ahh, so perhaps that's the key. > > Maybe. For the record, the commands look like this: > > archive_command = 'gzip -1 -c %p > /mnt/raid/wal-archive/%f.gz' > > restore_command = 'gunzip -c /mnt/raid/wal-archive/%f.gz > %p.tmp && mv > %p.tmp %p'
Yeah, sorry, I goofed here by not considering archive recovery properly. I have special handling for crash recovery from files in pg_wal (XLRO_END, means read until you run out of files) and streaming replication (XLRO_WALRCV_WRITTEN, means read only as far as the wal receiver has advertised as written in shared memory), as a way to control the ultimate limit on how far ahead to read when maintenance_io_concurrency and max_recovery_prefetch_distance don't limit you first. But if you recover from a base backup with a WAL archive, it uses the XLRO_END policy which can run out of files just because a new file hasn't been restored yet, so it gives up prefetching too soon, as you're seeing. That doesn't cause any damage, but it stops doing anything useful because the prefetcher thinks its job is finished. It'd be possible to fix this somehow in the two-XLogReader design, but since I'm testing a new version that has a unified XLogReader-with-read-ahead I'm not going to try to do that. I've added a basebackup-with-archive recovery to my arsenal of test workloads to make sure I don't forget about archive recovery mode again, but I think it's actually harder to get this wrong in the new design. In the meantime, if you are still interested in studying the potential speed-up from WAL prefetching using the most recently shared two-XLogReader patch, you'll need to unpack all your archived WAL files into pg_wal manually beforehand.