On Fri, Sep 8, 2023 at 10:08 AM Michael Paquier <mich...@paquier.xyz> wrote: > > > + /* > + * This is used to track the last persisted confirmed_flush LSN value to > + * detect any divergence in the in-memory and on-disk values for the > same. > + */ > > "This value tracks is the last LSN where this slot's data has been > flushed to disk. >
This makes the comment vague as this sounds like we are saving a slot corresponding to some LSN which is not the case. If you prefer this tone then we can instead say: "This value tracks the last confirmed_flush LSN flushed which is used during a checkpoint shutdown to decide if a logical slot's data should be forcibly flushed or not." > > This is used during a checkpoint shutdown to decide > if a logical slot's data should be forcibly flushed or not." > > Hmm. WAL senders are shut down *after* the checkpointer and *after* > the shutdown checkpoint is finished (see PM_SHUTDOWN and > PM_SHUTDOWN_2) because we want the WAL senders to acknowledge the > checkpoint record before shutting down the primary. > As per my understanding, this is not true for logical walsenders. As per code, while HandleCheckpointerInterrupts(), we call ShutdownXLOG() which sends a signal to walsender to stop and waits for it to stop. And only after that, did it write a shutdown checkpoint WAL record. After getting the InitStopping signal, walsender sets got_STOPPING flag. Then *logical* walsender ensures that it sends all the pending WAL and exits. What you have quoted is probably true for physical walsenders. > > In order to limit > the number of records to work on after a restart, what this patch is > proposing is an improvement. Perhaps it would be better to document > that we don't care about the potential concurrent activity of logical > WAL senders in this case and that the LSN we are saving at is a best > effort, meaning that last_saved_confirmed_flush is just here to reduce > the damage on a follow-up restart? > Unless I am wrong, there shouldn't be any concurrent activity for logical walsenders. IIRC, it is a mandatory requirement for logical walsenders to stop before shutdown checkpointer to avoid panic error. We do handle logical walsnders differently because they can generate WAL during decoding. > > The comment added in > CheckPointReplicationSlots() goes in this direction, but perhaps this > potential concurrent activity should be mentioned? > Sure, we can change it if required. -- With Regards, Amit Kapila.