POSIX has a  Synchronized I/O Data (and File) Integrity Completion
definition (line 115434 of the Issue 7 (POSIX.1-2008) specification). What it says is that writes for a byte range in a file must complete before any pending
reads for that byte range are satisfied.

It does not say that if you have 3 pending writes and pending reads for a byte range, that the writes must complete in the order issued - simply that they must all complete before any reads complete. See lines 71371-71376 in the write() discussion. The specification explicitly avoids discussing the "behavior of concurrent writes to a file from multiple processes." and suggests that applications doing this "should use some form
of concurrency control."

It is true that because of these semantics, many file system implementations will use locks to ensure that no reads can occur in the entire file while writes are happening which has the side effect of ensuring the writes are executed in the order they are issued. This is an implementation detail that can be complicated by async IO as well. The only guarantee POSIX offers is that all pending writes to the relevant byte range in the file will be completed before a read to that byte range is allowed. An in-progress read is expected to block any writes to the relevant byte range file the read completes.

The specification also does not say the bits for a file must end up on the disk without an intervening fsync() operation unless you've explicitly asked for data synchronization (O_SYNC, O_DSYNC) when you opened the file. The fsync() discussion (line 31956) says that the bits must undergo a "physical write of data from the buffer cache" that should be completed when the fsync() call returns. If there are errors, the return from the fsync() call should express the fact that one or more errors occurred. The only guarantee that the physical write happens is if the system supports the _POSIX_SYNCHRONIZED_IO option. If not, the comment is to read the system's conformance documentation (if any) to see what actually does happen. In the case that _POSIX_SYNCHRONIZED_IO is not supported,
it's perfectly allowable for fsync()  to be a no-op.

Jim Litchfield
-------------------
David Magda wrote:
On Mar 18, 2009, at 12:43, Bob Friesenhahn wrote:

POSIX does not care about "disks" or "filesystems". The only correct behavior is for operations to be applied in the order that they are requested of the operating system. This is a core function of any operating system. It is therefore ok for some (or all) of the data which was written to "new" to be lost, or for the rename operation to be lost, but it is not ok for the rename to end up with a corrupted file with the new name.

Out of curiousity, is this what POSIX actually specifies? If that is the case, wouldn't that mean that the behaviour of ext3/4 is incorrect? (Assuming that it does re-order operations.)

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to