Greetings,

* Kyotaro Horiguchi (horikyota....@gmail.com) wrote:
> At Mon, 29 Mar 2021 14:47:33 +0900, Michael Paquier <mich...@paquier.xyz> 
> wrote in 
> > On Fri, Mar 26, 2021 at 10:16:40AM -0700, Andres Freund wrote:
> > > On 2021-03-26 18:20:14 +0900, Kyotaro Horiguchi wrote:
> > > > This is because XLogSendPhysical detects removal of the wal segment
> > > > currently reading by shutdown checkpoint.  However, there' no fear of
> > > > overwriting of WAL segments at the time.
> > > >
> > > > So I think we can omit the call to CheckXLogRemoved() while
> > > > MyWalSnd->state is WALSNDSTTE_STOPPING because the state comes after
> > > > the shutdown checkpoint completes.
> > > >
> > > > Of course that doesn't help if walsender was running two segments
> > > > behind. There still could be a small window for the failure.  But it's
> > > > a great help to save the case of just 1 segment behind.
> > > 
> > > -1. This seems like a bandaid to make a broken configuration work a tiny
> > > bit better, without actually being meaningfully better.
> > 
> > Agreed.  Still, wouldn't it be better to avoid such configurations and
> > protect a bit things with a check on the new value?

I have a hard time agreeing that this is somehow a 'broken'
configuration, instead it looks like a race condition that wasn't
considered and should be addressed.  If there's zero lag then we really
should allow the final WAL to get sent to the replica.

> The repro was a bit artificial but the symptom happened without
> pg_switch_wal() and no load.  It caused just by shutting down of
> primary.  If it is normal behavior for walsenders to fail to send the
> last shutdown record to standby while fast shutdown, we should refuse
> to startup at least wal sender if wal_keep_size = 0.
> 
> I can guess two ways to do that.

Both of which will break things for people, so this certainly isn't a
great approach, and besides, if archiving is happening with
archive_command and the replica has a restore command then it should be
able to follow that just fine, no?  So we'd have to also check if
archive_command has been set up and hope the admin has a restore
command.  Having to go through that dance instead of just making sure to
push out the last WAL to the replica seems a bit silly though.

Thanks,

Stephen

Attachment: signature.asc
Description: PGP signature

Reply via email to