Hi Noah,
On 6/19/21 16:39, Noah Misch wrote:
On Tue, Feb 02, 2021 at 07:14:16AM -0800, Noah Misch wrote:
Recycling and preallocation are wasteful during archive recovery, because
KeepFileRestoredFromArchive() unlinks every entry in its path. I propose to
fix the race by adding an XLogCtl flag indicating which regime currently owns
the right to add long-term pg_wal directory entries. In the archive recovery
regime, the checkpointer will not preallocate and will unlink old segments
instead of recycling them (like wal_recycle=off). XLogFileInit() will fail.
Here's the implementation. Patches 1-4 suffice to stop the user-visible
ERROR. Patch 5 avoids a spurious LOG-level message and wasted filesystem
writes, and it provides some future-proofing.
I was tempted to (but did not) just remove preallocation. Creating one file
per checkpoint seems tiny relative to the max_wal_size=1GB default, so I
expect it's hard to isolate any benefit. Under the old checkpoint_segments=3
default, a preallocated segment covered a respectable third of the next
checkpoint. Before commit 63653f7 (2002), preallocation created more files.
This also seems like it would fix the link issues we are seeing in [1].
I wonder if that would make it worth a back patch?
[1]
https://www.postgresql.org/message-id/flat/CAKw-smBhLOGtRJTC5c%3DqKTPz8gz5%2BWPoVAXrHB6mY-1U4_N7-w%40mail.gmail.com