On Tue, Sep 12, 2023 at 10:55 AM Michael Paquier <mich...@paquier.xyz> wrote: > > On Mon, Sep 11, 2023 at 02:49:49PM +0530, Amit Kapila wrote: > > Please see the v11 attached, that rewords all the places of the patch > that need clarifications IMO. I've found that the comment additions > in CheckPointReplicationSlots() to be overcomplicated: > - The key point to force a flush of a slot if its confirmed_lsn has > moved ahead of the last LSN where it was saved is to make the follow > up restart more responsive. >
I don't think it will become more responsive in any way, not sure what made you think like that. The key idea is that after restart we want to ensure that all the WAL data up to the shutdown checkpoint record is sent to downstream. As mentioned in the commit message, this will help in ensuring that upgrades don't miss any data and then there is another small advantage as mentioned in the commit message. > - Not sure that there is any point to mention the other code paths in > the tree where ReplicationSlotSave() can be called, and a slot can be > saved in other processes than just WAL senders (like slot > manipulations in normal backends, for one). This was the last > sentence in v10. > The point was the earlier sentence is no longer true and keeping it as it is could be wrong or at least misleading. For example, earlier it is okay to say, "This needn't actually be part of a checkpoint, ..." but now that is no longer true as we want to invoke this at the time of shutdown checkpoint for correctness. If we want to be precise, we can say, "It is convenient to flush dirty replication slots at the time of checkpoint. Additionally, .." > > + if (s->data.invalidated == RS_INVAL_NONE && > + s->data.confirmed_flush != s->last_saved_confirmed_flush) > > Actually this is incorrect, no? Shouldn't we make sure that the > confirmed_flush is strictly higher than the last saved LSN? > I can't see why it is incorrect. Do you see how (in what scenario) it could go wrong? As per my understanding, confirmed_flush LSN will always be greater than equal to last_saved_confirmed_flush but we don't want to ensure that point here because we just want if the latest value is not the same then we should mark the slot dirty and flush it as that will be location we have ensured to update before walsender shutdown. I think it is better to add an assert if you are worried about any such case and we had thought of adding it as well but then didn't do it because we don't have matching asserts to ensure that we never assign prior LSN value to consfirmed_flush LSN. + /* + * LSN used to track the last confirmed_flush LSN where the slot's data + * has been flushed to disk. + */ + XLogRecPtr last_saved_confirmed_flush; I don't want to argue on such a point because it is a little bit of a matter of personal choice but I find this comment unclear. It seems to read that confirmed_flush LSN is some LSN position which is where we flushed the slot's data and that is not true. I found the last comment in the patch sent by me: "This value tracks the last confirmed_flush LSN flushed which is used during a shutdown checkpoint to decide if logical's slot data should be forcibly flushed or not." which I feel we agreed upon is better. -- With Regards, Amit Kapila.