On Thu, 1 Apr 2010, Edward Ned Harvey wrote:

If I'm wrong about this, please explain.

I am envisioning a database, which issues a small sync write, followed by a
larger async write.  Since the sync write is small, the OS would prefer to
defer the write and aggregate into a larger block.  So the possibility of
the later async write being committed to disk before the older sync write is
a real risk.  The end result would be inconsistency in my database file.

Zfs writes data in transaction groups and each bunch of data which gets written is bounded by a transaction group. The current state of the data at the time the TXG starts will be the state of the data once the TXG completes. If the system spontaneously reboots then it will restart at the last completed TXG so any residual writes which might have occured while a TXG write was in progress will be discarded. Based on this, I think that your ordering concerns (sync writes getting to disk "faster" than async writes) are unfounded for normal file I/O.

However, if file I/O is done via memory mapped files, then changed memory pages will not necessarily be written. The changes will not be known to ZFS until the kernel decides that a dirty page should be written or there is a conflicting traditional I/O which would update the same file data. Use of msync(3C) is necessary to assure that file data updated via mmap() will be seen by ZFS and comitted to disk in an orderly fashion.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to