On Mon, Jan 08, 2018 at 04:28:36PM +0100, Joerg Schilling wrote: > Tim Kientzle <t...@kientzle.com> wrote: > > > I'm not entirely sure I understand the above. > > > > It sounds like someone is claiming that: > > > > * Archiving programs should know about the *timing* of filesystem > > implementations (60s here for btrfs, something else for <new > > filesystem XYZ>?) > > > > * And specifically request the OS to fsync() files before trusting the > > metadata > > This is exactly the reason, why btrfs (in case it behaves as claimed) seems to > be be in conflict with POSIX. > > POSIX requires that stat() returns cached meta data instead of probably out > of > date information from the background medium. In other words: It is not > allowed to return different data before and after a sync() or fsync() call.
Are we talking about http://pubs.opengroup.org/onlinepubs/9699919799/functions/stat.html and http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html ? I can't see any such requirement there. Also, that "cached meta data", as you want, can't possibly work on any modern filesystem, not just btrfs. Some fields (such as timestamps) have a specified behaviour, but where not explicitly required, the filesystem is allowed to not copy implementation quirks of how it was done in 1970. For example, I see no place that mandates, eg, hard link count of a directory to be (subdir count+2) -- in fact, the standard, per my reading, wants this value to be always 1 (unless directory hard links are supported, which is not the case on any modern local filesystem I'm aware of). st_dev and st_ino, required by POSIX, are already problematic enough and need to be synthesised by filesystems with bogus values, and block innovation (such as, eg, directory reflinks). No need to add extra requirements which aren't there. Even filesystems that are direct descendants of sysvfs, such as ext4, don't know where a file will be placed until sync/writeout time, because of delalloc. Add compression or CoW, and there's no way to know this other than having stat() force allocation and compression, and return only after everything but physical writeout is done. On a networked filesystem, the entirety of fsync would need to be done. Here's an example: A dense file, 128KB big, is well-compressible, and uses a 24KB extent on the disk. You write a 32KB piece to the middle of the file, it compresses to 12KB. What is the count of used blocks? The whole original extent is pinned; there's no meaningful way to tell how much it takes -- calculating that would require reading the whole thing, recompressing, and writing again. Pretty wasteful especially if there are other reflinks to that extent. Thus, is the answer 128KB, 160KB, 56KB, or 24KB+whatever value you'd get from recompressing the 96kb piece? Back to the original question: how do you know if, upon writing a 128KB file, it'll compress to 128KB (using real extents) or to 10 bytes, storeable in metadata tree? stat() would have to pointlessly block and allocate -- especially wasteful if the file is still being written to. Thus: a valid argument may be "POSIX describes this" but not "sysvfs did so half a century ago". A file that doesn't have a single block allocated for it may thus return st_blocks of 0, no matter if it's empty or not. Meow! -- // If you believe in so-called "intellectual property", please immediately // cease using counterfeit alphabets. Instead, contact the nearest temple // of Amon, whose priests will provide you with scribal services for all // your writing needs, for Reasonable And Non-Discriminatory prices.