Greetings, * Kyotaro Horiguchi (horikyota....@gmail.com) wrote: > At Mon, 29 Mar 2021 14:47:33 +0900, Michael Paquier <mich...@paquier.xyz> > wrote in > > On Fri, Mar 26, 2021 at 10:16:40AM -0700, Andres Freund wrote: > > > On 2021-03-26 18:20:14 +0900, Kyotaro Horiguchi wrote: > > > > This is because XLogSendPhysical detects removal of the wal segment > > > > currently reading by shutdown checkpoint. However, there' no fear of > > > > overwriting of WAL segments at the time. > > > > > > > > So I think we can omit the call to CheckXLogRemoved() while > > > > MyWalSnd->state is WALSNDSTTE_STOPPING because the state comes after > > > > the shutdown checkpoint completes. > > > > > > > > Of course that doesn't help if walsender was running two segments > > > > behind. There still could be a small window for the failure. But it's > > > > a great help to save the case of just 1 segment behind. > > > > > > -1. This seems like a bandaid to make a broken configuration work a tiny > > > bit better, without actually being meaningfully better. > > > > Agreed. Still, wouldn't it be better to avoid such configurations and > > protect a bit things with a check on the new value?
I have a hard time agreeing that this is somehow a 'broken' configuration, instead it looks like a race condition that wasn't considered and should be addressed. If there's zero lag then we really should allow the final WAL to get sent to the replica. > The repro was a bit artificial but the symptom happened without > pg_switch_wal() and no load. It caused just by shutting down of > primary. If it is normal behavior for walsenders to fail to send the > last shutdown record to standby while fast shutdown, we should refuse > to startup at least wal sender if wal_keep_size = 0. > > I can guess two ways to do that. Both of which will break things for people, so this certainly isn't a great approach, and besides, if archiving is happening with archive_command and the replica has a restore command then it should be able to follow that just fine, no? So we'd have to also check if archive_command has been set up and hope the admin has a restore command. Having to go through that dance instead of just making sure to push out the last WAL to the replica seems a bit silly though. Thanks, Stephen
signature.asc
Description: PGP signature