On Sat, Jan 15, 2022 at 11:52 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > The cfbot reports that this version of the patch doesn't apply anymore:
Here is a new version of the patch which, unlike v1, I think is something we could seriously consider applying (not before v16, of course). It now removes CHECKPOINT_END_OF_RECOVERY completely, and I attach a second patch as well which nukes checkPoint.PrevTimeLineID as well. I mentioned two problems with $SUBJECT in the first email with this subject line. One was a bug, which Noah has since fixed (thanks, Noah). The other problem is that LogStandbySnapshot() and a bunch of its friends expect latestCompletedXid to always be a normal XID, which is a problem because (1) we currently set nextXid to 3 and (2) at startup, we compute latestCompletedXid = nextXid - 1. The current code dodges this kind of accidentally: the checkpoint that happens at startup is a "shutdown checkpoint" and thus skips logging a standby snapshot, since a shutdown checkpoint is a sure indicator that there are no running transactions. With the changes, the checkpoint at startup happens after we've started allowing write transactions, and thus a standby snapshot needs to be logged also. In the cases where the end-of-recovery record was already being used, the problem could have happened already, except for the fact that those cases involve a standby promotion, which doesn't happen during initdb. I explored a few possible ways of solving this problem. The first thing I considered was replacing latestCompletedXid with a name like incompleteXidHorizon. The idea is that, where latestCompletedXid is the highest XID that is known to have committed or aborted, incompleteXidHorizon would be the lowest XID such that all known commits or aborts are for lower XIDs. In effect, incompleteXidHorizon would be latestCompletedXid + 1. Since latestCompletedXid is always normal except when it's 2, incompleteXidHorizon would be normal in all cases. Initially this seemed fairly promising, but it kind of fell down when I realized that we copy latestCompletedXid into ComputeXidHorizonsResult.latest_completed. That seemed to me to make the consequences of the change a bit more far-reaching than I liked. Also, it wasn't entirely clear to me that I wouldn't be introducing any off-by-one errors into various wraparound calculations with this approach. The second thing I considered was skipping LogStandbySnapshot() during initdb. There are two ways of doing this and neither of them seem that great to me. Something that does work is to skip LogStandbySnapshot() when in single user mode, but there is no particular reason why WAL generated in single user mode couldn't be replayed on a standby, so this doesn't seem great. It's too big a hammer for what we really want, which is just to suppress this during initdb. Another way of approaching it is to skip it in bootstrap mode, but that actually doesn't work: initdb then fails during the post-bootstrapping step rather than during bootstrapping. I thought about patching around that by forcing the code that generates checkpoint records to forcibly advance an XID of 3 to 4, but that seemed like working around the problem from the wrong end. So ... I decided that the best approach here seems to be to just skip FirstNormalTransactionId and use FirstNormalTransactionId + 1 for the first write transaction that the cluster ever processes. That's very simple and doesn't seem likely to break anything else. On the downside it seems a bit grotty, but I don't see anything better, and on the whole, I think with this approach we come out substantially ahead. 0001 removes 3 times as many lines as it inserts, and 0002 saves a few more lines of code. Now, I still don't really know that there isn't some theoretical difficulty here that makes this whole approach a non-starter, but I also can't think of what it might be. If the promotion case, which has used the end-of-recovery record for many years, is basically safe, despite the fact that it switches TLIs, then it seems to me that the crash recovery case, which doesn't have that complication, ought to be safe too. But I might well be missing something, so if you see a problem, please speak up! -- Robert Haas EDB: http://www.enterprisedb.com
v4-0001-Remove-the-end-of-recovery-checkpoint-in-all-case.patch
Description: Binary data
v4-0002-Remove-previous-timeline-ID-from-checkpoint.patch
Description: Binary data