Hi, On 2019-05-23 14:06:57 -0400, Tom Lane wrote: > Andres Freund <and...@anarazel.de> writes: > > On 2019-05-23 10:46:02 -0700, Mark Dilger wrote: > >> Is this code safe against fsync failures? If so, can I get an explanation > >> that I might put into a code comment patch? > > > What's the danger you're thinking of here? The issue with ignoring fsync > > failures is that it could be the one signal about data corruption we get > > for a write()/fsync() that failed - i.e. that durability cannot be > > guaranteed. But we don't care about the file contents of those files. > > Hmm ... if we don't care, why are we issuing an fsync at all?
Fair point. I think we do care in most of those cases, but we don't need to trigger a PANIC. We'd be in trouble if e.g. an older tablespace map file were to "revive" later. Looking at the cases: - durable_unlink(TABLESPACE_MAP, DEBUG1) - we definitely care about a failure to unlink/remove, but *not* about ENOENT, because that's expected. - /* Force installation: get rid of any pre-existing segment file */ durable_unlink(path, DEBUG1); same. - RemoveXlogFile(): rc = durable_unlink(path, LOG); It's probably *tolerable* to fail here. Not sure why this is a durable_unlink(LOG) - doesn't make a ton of sense to me. - durable_unlink(BACKUP_LABEL_FILE, ERROR); This is a "whaa, bad shit is happening" kind of situation. But crashing probably would make it even worse, because we'd restart assuming we're restoring from a backup. ISTM that durable_unlink() for the first two cases really needs a separate 'missing_ok' type argument. And that the reason for using DEBUG1 here is solely an outcome of that not existing. Greetings, Andres Freund