Hi, On 2020-05-16 22:51:50 -0400, Alvaro Herrera wrote: > On 2020-May-16, Andres Freund wrote: > > > I, independent of this patch, added a few additional paths in which > > checkpointer's latch is reset, and I found a few shutdowns in regression > > tests to be extremely slow / timing out. The reason for that is that > > the only check for interrupts is at the top of the loop. So if > > checkpointer gets SIGUSR2 we don't see ShutdownRequestPending until we > > decide to do a checkpoint for other reasons. > > Ah, yeah, this seems a genuine bug. > > > I also suspect that it could have harmful consequences to not do a > > AbsorbSyncRequests() if something "ate" the set latch. > > I traced through this when looking over the previous fix, and given that > checkpoint execution itself calls AbsorbSyncRequests frequently, I > don't think this one qualifies as a bug.
There's no AbsorbSyncRequests() after CheckPointBuffers(), I think. And e.g. CheckPointTwoPhase() could take a while. Which then would mean that we'd potentially not AbsorbSyncRequests() until checkpoint_timeout causes us to wake up. Am I missing something? > > One way to do that would be to WaitLatch() call to much earlier, and > > only do a WaitLatch() if do_checkpoint is false. Roughly like in the > > attached. > > Hm. I'd do "WaitLatch() / continue" in the "!do_checkpoint" block, and > put the checpkoint code not in the else block; seems easier to read to > me. Yea, that'd probably be better. I was also pondering if we shouldn't just move the checkpoint code into, gasp, it's own function ;) Greetings, Andres Freund