On Tue, Mar 2, 2021 at 7:22 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > Why do we need to move promote related code in XLogAcceptWrites? > IMHO, this promote related handling should be in StartupXLOG only. > That will look cleaner.
The key design question here, at least in my mind, is what exactly happens after prohibit-WAL + system-crash + recovery-finishes. We clearly can't write the checkpoint or end-of-recovery record and proceed with business as usual, but are we still in recovery? Either (1) we are technically still in recovery, stopping just short of entering normal running, and will emerge from recovery when WAL is permitted again; or (2) we have technically finished recovery, but deferred some of the actions that would normally occur at that time until a later point. Maybe this is academic distinction as much as anything, but the idea is if we choose #1 then we should do as little as possible at the point when recovery finishes and defer as much as possible until we actually enter normal running; whereas if we choose #2 we should do as much as possible at the point when recovery finishes and defer only those things which absolutely have to be deferred. That said, I and I think also Andres are voting for #2. But if we go that way, that precludes what you are proposing here. If we picked #1 then it would be natural for the startup process to remain active and the control file update to be postponed until WAL writes are re-enabled; but under model #2 we want, if possible, for the startup process to exit and the control file update to happen normally, and only the writing of the actual WAL records to be deferred. What I find much odder, looking at the present patch, is that PerformPendingStartupOperations() gets called from pg_prohibit_wal() rather than by the checkpointer. If the checkpointer is the process that is in charge of coordinating the change between a read-only state and a read-write state, then it ought to also do this. I also think that the PerformPendingStartupOperations() wrapper is unnecessary. Just invert the sense of the XLogCtl flag: xlogAllowWritesDone, rather than startupCrashRecoveryPending, and have XLogAcceptWrites() set it (and return without doing anything if it's already set). Then the checkpointer can just call the function unconditionally whenever we go read-write, and for a bonus we will have much better naming consistency, rather than calling the same thing "xlog accept writes" in one place, "pending startup operations" in another, and "startup crash recovery pending" in a third. Since this feature is basically no longer "alter system read only" but rather "pg_prohibit_wal" I think we also ought to rename the GUC, system_is_read_only -> wal_prohibited. -- Robert Haas EDB: http://www.enterprisedb.com