Re: persist logical slots to disk during shutdown checkpoint

Amit Kapila Mon, 11 Sep 2023 02:20:31 -0700

On Mon, Sep 11, 2023 at 12:08 PM Michael Paquier <[email protected]> wrote:
>
> On Fri, Sep 08, 2023 at 11:50:37AM +0530, Amit Kapila wrote:
> > On Fri, Sep 8, 2023 at 10:08 AM Michael Paquier <[email protected]> wrote:
> >>
> >>
> >> +    /*
> >> +     * This is used to track the last persisted confirmed_flush LSN value 
> >> to
> >> +     * detect any divergence in the in-memory and on-disk values for the 
> >> same.
> >> +     */
> >>
> >> "This value tracks is the last LSN where this slot's data has been
> >> flushed to disk.
> >>
> >
> > This makes the comment vague as this sounds like we are saving a slot
> > corresponding to some LSN which is not the case. If you prefer this
> > tone then we can instead say: "This value tracks the last
> > confirmed_flush LSN flushed which is used during a checkpoint shutdown
> > to decide if a logical slot's data should be forcibly flushed or not."
>
> Okay, that looks like an improvement over the term "persisted".
>


Changed accordingly.

> >> This is used during a checkpoint shutdown to decide
> >> if a logical slot's data should be forcibly flushed or not."
> >>
> >> Hmm.  WAL senders are shut down *after* the checkpointer and *after*
> >> the shutdown checkpoint is finished (see PM_SHUTDOWN and
> >> PM_SHUTDOWN_2) because we want the WAL senders to acknowledge the
> >> checkpoint record before shutting down the primary.
> >>
> >
> > As per my understanding, this is not true for logical walsenders. As
> > per code, while HandleCheckpointerInterrupts(), we call ShutdownXLOG()
> > which sends a signal to walsender to stop and waits for it to stop.
> > And only after that, did it write a shutdown checkpoint WAL record.
> > After getting the InitStopping signal, walsender sets got_STOPPING
> > flag. Then *logical* walsender ensures that it sends all the pending
> > WAL and exits. What you have quoted is probably true for physical
> > walsenders.
>
> Hm, reminding me about this area..  This roots down to the handling of
> WalSndCaughtUp in the send_data callback for logical or physical.
> This is switched to true for logical WAL senders much earlier than
> physical WAL senders, aka before the shutdown checkpoint begins in the
> latter.  What was itching me a bit is that the postmaster logic could
> be made more solid.  Logical and physical WAL senders are both marked
> as BACKEND_TYPE_WALSND, but we don't actually check that the WAL
> senders remaining at the end of PM_SHUTDOWN_2 are *not* connected to a
> database.  This would require a new BACKEND_TYPE_* perhaps, or perhaps
> we're fine with the current state because we'll catch up problems in
> the checkpointer if any WAL is generated while the shutdown checkpoint
> is running anyway.  Just something I got in mind, unrelated to this
> patch.
>

I don't know if the difference is worth inventing a new BACKEND_TYPE_*
but if you think so then we can probably discuss this in a new thread.
I think we may want to improve some comments as a separate patch to
make this evident.

>
> + * We can flush dirty replication slots at regular intervals by any
> + * background process like bgwriter but checkpoint is a convenient location.
>
> I still don't quite see a need to mention the bgwriter at all here..
> That's just unrelated.
>

I don't disagree with it, so changed it in the attached patch.

> The comment block in CheckPointReplicationSlots() from v10 uses
> "persist", but you mean "flush", I guess..
>

This point is not very clear to me. Can you please quote the exact
comment if you think something needs to be changed?

-- 
With Regards,
Amit Kapila.

v11-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch
Description: Binary data

Re: persist logical slots to disk during shutdown checkpoint

Reply via email to