Hi, On 2020-12-09 16:13:06 -0500, Robert Haas wrote: > That's not good. On a typical busy system, a system is going to be in > the middle of a checkpoint most of the time, and the checkpoint will > take a long time to finish - maybe minutes.
Or hours, even. Due to the cost of FPWs it can make a lot of sense to reduce the frequency of that cost... > We want this feature to respond within milliseconds or a few seconds, > not minutes. So we need something better here. Indeed. > I'm inclined to think > that we should try to CompleteWALProhibitChange() at the same places > we AbsorbSyncRequests(). We know from experience that bad things > happen if we fail to absorb sync requests in a timely fashion, so we > probably have enough calls to AbsorbSyncRequests() to make sure that > we always do that work in a timely fashion. So, if we do this work in > the same place, then it will also be done in a timely fashion. Sounds sane, without having looked in detail. > I'm not 100% sure whether that introduces any other problems. > Certainly, we're not going to be able to finish the checkpoint once > we've gone read-only, so we'll fail when we try to write the WAL > record for that, or maybe earlier if there's anything else that tries > to write WAL. Either the checkpoint needs to error out, like any other > attempt to write WAL, and we can attempt a new checkpoint if and when > we go read/write, or else we need to finish writing stuff out to disk > but not actually write the checkpoint completion record (or any other > WAL) unless and until the system goes back into read/write mode - and > then at that point the previously-started checkpoint will finish > normally. The latter seems better if we can make it work, but the > former is probably also acceptable. What you've got right now is not. I mostly wonder which of those two has which implications for how many FPWs we need to redo. Presumably stalling but not cancelling the current checkpoint is better? Greetings, Andres Freund