POSIX has a Synchronized I/O Data (and File) Integrity Completion
definition (line 115434 of the Issue 7 (POSIX.1-2008) specification).
What it
says is that writes for a byte range in a file must complete before any
pending
reads for that byte range are satisfied.
It does not say that if you have 3 pending writes and pending reads for
a byte range,
that the writes must complete in the order issued - simply that they
must all complete
before any reads complete. See lines 71371-71376 in the write()
discussion. The
specification explicitly avoids discussing the "behavior of concurrent
writes to a file from
multiple processes." and suggests that applications doing this "should
use some form
of concurrency control."
It is true that because of these semantics, many file system
implementations will use
locks to ensure that no reads can occur in the entire file while writes
are happening
which has the side effect of ensuring the writes are executed in the
order they are issued.
This is an implementation detail that can be complicated by async IO as
well. The only
guarantee POSIX offers is that all pending writes to the relevant
byte range in the file
will be completed before a read to that byte range is allowed. An
in-progress read is
expected to block any writes to the relevant byte range file the read
completes.
The specification also does not say the bits for a file must end up on
the disk without
an intervening fsync() operation unless you've explicitly asked for
data synchronization
(O_SYNC, O_DSYNC) when you opened the file. The fsync() discussion
(line 31956)
says that the bits must undergo a "physical write of data from the
buffer cache" that should
be completed when the fsync() call returns. If there are errors, the
return from the fsync()
call should express the fact that one or more errors occurred. The only
guarantee that the
physical write happens is if the system supports the
_POSIX_SYNCHRONIZED_IO option. If
not, the comment is to read the system's conformance documentation (if
any) to see what
actually does happen. In the case that _POSIX_SYNCHRONIZED_IO is not
supported,
it's perfectly allowable for fsync() to be a no-op.
Jim Litchfield
-------------------
David Magda wrote:
On Mar 18, 2009, at 12:43, Bob Friesenhahn wrote:
POSIX does not care about "disks" or "filesystems". The only correct
behavior is for operations to be applied in the order that they are
requested of the operating system. This is a core function of any
operating system. It is therefore ok for some (or all) of the data
which was written to "new" to be lost, or for the rename operation to
be lost, but it is not ok for the rename to end up with a corrupted
file with the new name.
Out of curiousity, is this what POSIX actually specifies? If that is
the case, wouldn't that mean that the behaviour of ext3/4 is
incorrect? (Assuming that it does re-order operations.)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss