On Thu, Jun 18, 2020 at 11:56 AM Jehan-Guillaume de Rorthais <j...@dalibo.com> wrote: > Considering the current demote patch improvement. I was considering to digg in > the following direction: > > * add a new state in the state machine where all backends are idle > * this new state forbid any new writes, the same fashion we do on standby > nodes > * this state could either wait for end of xact, or cancel/kill > RW backends, in the same fashion current smart/fast stop do > * from this state, we might then rollback pending prepared xact, stop other > sub-process etc (as the current patch does), and demote safely to > PM_RECOVERY or PM_HOT_STANDBY (depending on the setup). > > Is it something worth considering? > Maybe the code will be so close from ASRO, it would just be kind of a fusion > of > both patch? I don't know, I didn't look at the ASRO patch yet.
I don't think that the postmaster state machine is the interesting part of this problem. The tricky parts have to do with updating shared memory state, and with updating per-backend private state. For example, snapshots are taken in a different way during recovery than they are in normal operation, hence SnapshotData's takenDuringRecovery member. And I think that we allocate extra shared memory space for storing the data that those snapshots use if, and only if, the server starts up in recovery. So if the server goes backward from normal running into recovery, we might not have the space that we need in shared memory to store the extra data, and even if we had the space it might not be populated correctly, and the code that takes snapshots might not be written properly to handle multiple transitions between recovery and normal running, or even a single backward transition. In general, there's code scattered all throughout the system that assumes the recovery -> normal running transition is one-way. If we go back into recovery by killing off all backends and reinitializing shared memory, then we don't have to worry about that stuff. If we do anything less than that, we have to find all the code that relies on never reentering recovery and fix it all. Now it's also true that we have to do some other things, like restarting the startup process, and stopping things like autovacuum, and the postmaster may need to be involved in some of that. There's clearly some engineering work there, but I think it's substantially less than the amount of engineering work involved in fixing problems with shared memory contents and backend-local state. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company