Andres Freund <and...@anarazel.de> writes: > On 2019-02-13 11:57:32 -0500, Tom Lane wrote: >> I've managed to reproduce this locally, and obtained this PANIC:
> Cool. How exactly? Andrew told me that nightjar is actually running in a qemu VM, so I set up freebsd 9.0 in a qemu VM, and boom. It took a bit of fiddling with qemu parameters, but for such a timing-sensitive problem, that's not surprising. >> Anyway, I think we might be able to fix this along the lines of >> [ fsync the data before renaming not after ] > Hm, but that's not the same? On some filesystems one needs the directory > fsync, on some the file fsync, and I think both in some cases. Now that I look at it, there's a pg_fsync() just above this, so I wonder why we need a second fsync on the file at all. fsync'ing the directory is needed to ensure the directory entry is on disk; but the file data should be out already, or else the kernel is simply failing to honor fsync. >> The existing code here seems simply wacky/unsafe to me regardless >> of this race condition: couldn't it potentially result in a corrupt >> snapshot file appearing with a valid name, if the system crashes >> after persisting the rename but before it's pushed the data out? > What do you mean precisely with "before it's pushed the data out"? Given the previous pg_fsync, this isn't an issue. >> I also wonder why bother with the directory sync just before the >> rename. > Because on some FS/OS combinations the size of the renamed-into-place > file isn't guaranteed to be durable unless the directory was > fsynced. Bleah. But in any case, the rename should not create a situation in which we need to fsync the file data again. regards, tom lane