Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

James Litchfield Wed, 18 Mar 2009 22:00:44 -0700

POSIX has a  Synchronized I/O Data (and File) Integrity Completion

definition (line 115434 of the Issue 7 (POSIX.1-2008) specification).What itsays is that writes for a byte range in a file must complete before anypending

reads for that byte range are satisfied.

It does not say that if you have 3 pending writes and pending reads fora byte range,that the writes must complete in the order issued - simply that theymust all completebefore any reads complete. See lines 71371-71376 in the write()discussion. Thespecification explicitly avoids discussing the "behavior of concurrentwrites to a file frommultiple processes." and suggests that applications doing this "shoulduse some form

of concurrency control."

It is true that because of these semantics, many file systemimplementations will uselocks to ensure that no reads can occur in the entire file while writesare happeningwhich has the side effect of ensuring the writes are executed in theorder they are issued.This is an implementation detail that can be complicated by async IO aswell. The onlyguarantee POSIX offers is that all pending writes to the relevantbyte range in the filewill be completed before a read to that byte range is allowed. Anin-progress read isexpected to block any writes to the relevant byte range file the readcompletes.

The specification also does not say the bits for a file must end up onthe disk withoutan intervening fsync() operation unless you've explicitly asked fordata synchronization(O_SYNC, O_DSYNC) when you opened the file. The fsync() discussion(line 31956)says that the bits must undergo a "physical write of data from thebuffer cache" that shouldbe completed when the fsync() call returns. If there are errors, thereturn from the fsync()call should express the fact that one or more errors occurred. The onlyguarantee that thephysical write happens is if the system supports the_POSIX_SYNCHRONIZED_IO option. Ifnot, the comment is to read the system's conformance documentation (ifany) to see whatactually does happen. In the case that _POSIX_SYNCHRONIZED_IO is notsupported,

it's perfectly allowable for fsync()  to be a no-op.

Jim Litchfield
-------------------
David Magda wrote:

On Mar 18, 2009, at 12:43, Bob Friesenhahn wrote:
POSIX does not care about "disks" or "filesystems". The only correctbehavior is for operations to be applied in the order that they arerequested of the operating system. This is a core function of anyoperating system. It is therefore ok for some (or all) of the datawhich was written to "new" to be lost, or for the rename operation tobe lost, but it is not ok for the rename to end up with a corruptedfile with the new name.
Out of curiousity, is this what POSIX actually specifies? If that isthe case, wouldn't that mean that the behaviour of ext3/4 isincorrect? (Assuming that it does re-order operations.)
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

Reply via email to