Hi, On 2019-02-14 09:52:33 +1300, Thomas Munro wrote: > On Thu, Feb 14, 2019 at 8:11 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Andres Freund <and...@anarazel.de> writes: > > > I was kinda pondering just open coding it. I am not yet convinced that > > > my idea of just using an open FD isn't the least bad approach for the > > > issue at hand. What precisely is the NFS issue you're concerned about? > > > > I'm not sure that fsync-on-FD after the rename will work, considering that > > the issue here is that somebody might've unlinked the file altogether > > before we get to doing the fsync. I don't have a hard time believing that > > that might result in a failure report on NFS or similar. Yeah, it's > > hypothetical, but the argument that we need a repeat fsync at all seems > > equally hypothetical. > > > > > Right now fsync_fname_ext isn't exposed outside fd.c... > > > > Mmm. That makes it easier to consider changing its API. > > Just to make sure I understand: it's OK for the file not to be there > when we try to fsync it by name, because a concurrent checkpoint can > remove it, having determined that we don't need it anymore? In other > words, we really needed either missing_ok=true semantics, or to use > the fd we already had instead of the name?
I'm not yet sure that that's actually something that's supposed to happen, I got to spend some time analysing how this actually happens. Normally the contents of the slot should actually prevent it from being removed (as they're newer than ReplicationSlotsComputeLogicalRestartLSN()). I kind of wonder if that's a bug in the drop logic in newer releases. Greetings, Andres Freund