Re: persist logical slots to disk during shutdown checkpoint

Ashutosh Bapat Wed, 30 Aug 2023 23:55:52 -0700

On Thu, Aug 31, 2023 at 12:10 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> > +
> > +    /*
> > +     * We won't ensure that the slot is persisted after the confirmed_flush
> > +     * LSN is updated as that could lead to frequent writes.  However, we 
> > need
> > +     * to ensure that we do persist the slots at the time of shutdown whose
> > +     * confirmed_flush LSN is changed since we last saved the slot to disk.
> > +     * This will help in avoiding retreat of the confirmed_flush LSN after
> > +     * restart.  This variable is used to track the last saved 
> > confirmed_flush
> > +     * LSN value.
> > +     */
> >
> > This comment makes more sense in SaveSlotToPath() than here. We may decide 
> > to
> > use last_saved_confirmed_flush for something else in future.
> >
>
> I have kept it here because it contains some information that is not
> specific to SaveSlotToPath. So, it seems easier to follow the whole
> theory if we keep it at the central place in the structure and then
> add the reference wherever required but I am fine if you and others
> feel strongly about moving this to SaveSlotToPath().


Saving slot to disk happens only in SaveSlotToPath, so except the last
sentence rest of the comment makes sense in SaveSlotToPath().

> >
> > This function assumes that the subscriber will receive and confirm WAL upto
> > checkpoint location and publisher's WAL sender will update it in the slot.
> > Where is the code to ensure that? Does the WAL sender process wait for
> > checkpoint
> > LSN to be confirmed when shutting down?
> >
>
> Note, that we need to compare if all the WAL before the
> shutdown_checkpoint WAL record is sent. Before the clean shutdown, we
> do ensure that all the pending WAL is confirmed back. See the use of
> WalSndDone() in WalSndLoop().

Ok. Thanks for pointing that out to me.

>
> > I
> > think we should shut down subscriber, restart publisher and then make this
> > check based on the contents of the replication slot instead of server log.
> > Shutting down subscriber will ensure that the subscriber won't send any new
> > confirmed flush location to the publisher after restart.
> >
>
> But if we shutdown the subscriber before the publisher there is no
> guarantee that the publisher has sent all outstanding logs up to the
> shutdown checkpoint record (i.e., the latest record). Such a guarantee
> can only be there if we do a clean shutdown of the publisher before
> the subscriber.

So the sequence is shutdown publisher node, shutdown subscriber node,
start publisher node and carry out the checks.

>
> > All the places which call ReplicationSlotSave() mark the slot as dirty.  All
> > the places where SaveSlotToPath() is called, the slot is marked dirty except
> > when calling from CheckPointReplicationSlots(). So I am wondering whether we
> > should be marking the slot dirty in CheckPointReplicationSlots() and avoid
> > passing down is_shutdown flag to SaveSlotToPath().
> >
>
> I feel that will add another spinlock acquire/release pair without
> much benefit. Sure, it may not be performance-sensitive but still
> adding another pair of lock/release doesn't seem like a better idea.

We call ReplicatioinSlotMarkDirty() followed by ReplicationSlotSave()
at all the places, even those which are more frequent than this. So I
think it's better to stick to that protocol rather than adding a new
flag.

-- 
Best Wishes,
Ashutosh Bapat

Re: persist logical slots to disk during shutdown checkpoint

Reply via email to