On Fri, Aug 18, 2023 at 03:40:57PM +0900, torikoshia wrote: > Thanks for the patch, I've marked this as ready-for-committer. > > BTW, this issue can be considered a bug, right? > I think it would be appropriate to provide backpatch.
Hmm, I agree that there is a good argument in back-patching as we have the WAL files between the redo LSN and the divergence LSN, but pg_rewind is not smart enough to keep them around. If the archives of the primary were not able to catch up, the old primary is as good as kaput, and restore_command won't help here. I don't like much this patch. While it takes correctly advantage of the backward record read logic from SimpleXLogPageRead() able to handle correctly timeline jumps, it creates a hidden dependency in the code between the hash table from filemap.c and the page callback. Wouldn't it be simpler to build a list of the segment names using the information from WALOpenSegment and build this list in findLastCheckpoint()? Also, I am wondering if we should be smarter with any potential conflict handling between the source and the target, rather than just enforcing a FILE_ACTION_NONE for all these files. In short, could it be better to copy the WAL file from the source if it exists there? + /* + * Some entries (WAL segments) already have an action assigned + * (see SimpleXLogPageRead()). + */ + if (entry->action == FILE_ACTION_UNDECIDED) + entry->action = decide_file_action(entry); This change makes me a bit uneasy, per se my previous comment with the additional code dependencies. I think that this scenario deserves a test case. If one wants to emulate a delay in WAL archiving, it is possible to set archive_command to a command that we know will fail, for instance. -- Michael
signature.asc
Description: PGP signature