Thanks for the report, reproducer and the patches. At Fri, 16 Jun 2023 16:27:40 +0200, Julian Markwort <julian.markw...@cybertec.at> wrote in > - prepare a transaction > - crash postgresql > - create standby.signal file > - start postgresql, wait for recovery to finish > - promote .. > The promotion will fail with a FATAL error, stating that "requested WAL > segment .* has already been removed". > The FATAL error causes the startup process to exit, so postmaster shuts down > again. > > Here's an exemplary log output, maybe this helps people to find this issue > when they search for it online:
> LOG: redo done at 0/15D89B8 > LOG: last completed transaction was at log time 2023-06-16 13:09:53.71118+02 > LOG: selected new timeline ID: 2 > LOG: archive recovery complete > FATAL: requested WAL segment pg_wal/000000010000000000000001 has already > been removed > LOG: startup process (PID 1650358) exited with exit code 1 Reproduced here. > The cause of this failure is an oversight (rather obvious in hindsight): > The renaming of the WAL file (that was last written to before the crash > happened) to .partial is done *before* PostgreSQL > might have to read this very file to recover prepared transactions from it. > The relevant function calls here are durable_rename() and > RecoverPreparedTransactions() in xlog.c . > This behaviour has - apparently unintentionally - been fixed in PG 15 and > upwards (see commit 811051c ), as part of a > general restructure and reorganization of this portion of xlog.c (see commit > 6df1543 ). I think so, the reordering might have done for some other reasons, though. > Furthermore, it seems this behaviour does not appear in PG 12 and older, due > to another possible bug: <snip>... > In PG 13 and newer, the XLogReaderState is reset in XLogBeginRead() > before reading WAL in XlogReadTwoPhaseData() in twophase.c . I arraived at the same conclusion. > In the older releases (PG <= 12), this reset is not done, so the requested > LSN containing the prepared transaction can > (by happy coincidence) be read from in-memory buffers, and PostgreSQL > consequently manages to come up just fine (as the > WAL has already been read into buffers prior to the .partial rename). > If the older releases also where to properly reset the XLogReaderState, they > would also fail to find the LSN on disk, and > hence PostgreSQL would crash again. >From the perspective of loading WAL for prepared transactions, the current code in those versions seems fine. Although I suspect Windows may not like to rename currently-open segments, it's likely acceptable as the current test set operates without issue.. (I didn't tested this.) > I've attached patches for PG 14 and PG 13 that mimic the change in PG15 > (commit 811051c ) and reorder the crucial events, > placing the recovery of prepared transactions *before* renaming the file. It appears to move the correct part of the code to the proper location, modifying the steps to align with later versions. > I've also attached recovery test scripts for PG >= 12 and PG <= 11 that can > be used to verify that promote after recovery > with prepared transactions works. It effectively detects the bug, though it can't be directly used in the tree as-is. I'm unsure whether we need this in the tree, though. > My humble opinion is that this fix should be backpatched to PG 14 and PG 13. I agree with you. > It's debatable whether the fix needs to be brought back to 12 and older also, > as those do not exhibit this issue, but the > order of renaming is still wrong. > I'm not sure if there could be cases where the in-memory buffers of the > walreader are too small to cover a whole WAL > file. > There could also be other issues from operations that require reading WAL > that happen after the .partial rename, I > haven't checked in depth what else happens in the affected codepath. > Please let me know if you think this should also be fixed in PG 12 and > earlier, so I can produce the patches for those > versions as well. There's no immediate need to change the versions. However, I would prefer to backpatch them to the older versions for the following reasons. 1. Applying this eases future backpatching in this area, if any. 2. I have reservations about renaming possibly-open WAL segments. regards. -- Kyotaro Horiguchi NTT Open Source Software Center