Re: Physical replication slot advance is not persistent

Craig Ringer Tue, 28 Jan 2020 01:46:25 -0800

On Tue, 28 Jan 2020 at 16:01, Michael Paquier <mich...@paquier.xyz> wrote:
>
> On Tue, Jan 21, 2020 at 02:07:30PM +0900, Michael Paquier wrote:
> > On Mon, Jan 20, 2020 at 11:00:14AM -0800, Andres Freund wrote:
> >> Hm. I'm think testing this with real LSNs is a better idea. What if the
> >> node actually already is past FF/FFFFFFFF at this point? Quite unlikely,
> >> I know, but still. I.e. why not get the current LSN after the INSERT,
> >> and assert that the slot, after the restart, is that?
> >
> > Sure.  If not disabling autovacuum in the test, we'd just need to make
> > sure if that advancing is at least ahead of the INSERT position.
>
> Actually, as the advancing happens only up to this position we just
> need to make sure that the LSN reported by the slot is the same as the
> position advanced to.  I have switched the test to just do that
> instead of using a fake LSN.
>
> > Anyway, I am still not sure if we should got down the road to just
> > mark the slot as dirty if any advancing is done and let the follow-up
> > checkpoint to the work, if the advancing function should do the slot
> > flush, or if we choose one and make the other an optional choice (not
> > for back-branches, obviously.  Based on my reading of the code, my
> > guess is that a flush should happen at the end of the advancing
> > function.
>
> I have been chewing on this point for a couple of days, and as we may
> actually crash between the moment the slot is marked as dirty and the
> moment the slot information is made consistent, we still have a risk
> to have the slot go backwards even if the slot information is saved.
> The window is much narrower, but well, the docs of logical decoding
> mention that this risk exists.  And the patch becomes much more
> simple without changing the actual behavior present since the feature
> has been introduced for logical slots.  There could be a point in
> having a new option to flush the slot information, or actually a
> separate function to flush the slot information, but let's keep that
> for a future possibility.
>
> So attached is an updated patch which addresses the problem just by
> marking a physical slot as dirty if any advancing is done.  Some
> documentation is added about the fact that an advance is persistent
> only at the follow-up checkpoint.  And the tests are fixed to not use
> a fake LSN but instead advance to the latest LSN position produced.
>
> Any objections?


LGTM. Thankyou.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

Re: Physical replication slot advance is not persistent

Reply via email to