On Thu, 1 Apr 2010, Edward Ned Harvey wrote:
If I'm wrong about this, please explain.
I am envisioning a database, which issues a small sync write, followed by a
larger async write. Since the sync write is small, the OS would prefer to
defer the write and aggregate into a larger block. So the possibility of
the later async write being committed to disk before the older sync write is
a real risk. The end result would be inconsistency in my database file.
Zfs writes data in transaction groups and each bunch of data which
gets written is bounded by a transaction group. The current state of
the data at the time the TXG starts will be the state of the data once
the TXG completes. If the system spontaneously reboots then it will
restart at the last completed TXG so any residual writes which might
have occured while a TXG write was in progress will be discarded.
Based on this, I think that your ordering concerns (sync writes
getting to disk "faster" than async writes) are unfounded for normal
file I/O.
However, if file I/O is done via memory mapped files, then changed
memory pages will not necessarily be written. The changes will not be
known to ZFS until the kernel decides that a dirty page should be
written or there is a conflicting traditional I/O which would update
the same file data. Use of msync(3C) is necessary to assure that file
data updated via mmap() will be seen by ZFS and comitted to disk in an
orderly fashion.
Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss