Hi, On 2022-01-28 00:39:22 +0200, Heikki Linnakangas wrote: > On 28/01/2022 00:11, Thomas Munro wrote: > > ... but we still never synchronize "base/5". According to our > > project's reading of the POSIX tea leaves we should be doing that to > > nail down the directory entry. > > Really? 'base/5' is fsync'd by initdb, when it's created. I didn't think we > try to fsync() the directory, when a new file is created in it.
I've not heard of concrete reports of it being needed (whereas the directory fsync being needed after a rename() is pretty easy to be reproduce). There's some technical reasons why it'd make sense for it to only be really needed for things after the initial file creation, but I'm not sure it's a good idea to rely on it. >From the filesystem POV a file doesn't necessarily know which directory it is in, and it can be several at once due to hardlinks. Scheduling all the directories entries to be durable only really is feasible if all metadata operations are globally ordered - which we don't really want for scalability reasons. But when initially creating a file, it's always in the context of a directory. I assume that most journalled filesystem is going to have the creation of the directory entry and of the file itself be related journalling operations. But I wouldn't bet it's all, and theoretically we claim to be usable on non-journalled filesystems as well... Greetings, Andres Freund