On 03/06/09 08:10, Jim Dunham wrote:
Andrew,
Jim Dunham wrote:
ZFS the filesystem is always on disk consistent, and ZFS does
maintain filesystem consistency through coordination between the ZPL
(ZFS POSIX Layer) and the ZIL (ZFS Intent Log). Unfortunately for
SNDR, ZFS caches a lot of an applications filesystem data in the ZIL,
therefore the data is in memory, not written to disk, so SNDR does
not know this data exists. ZIL flushes to disk can be seconds behind
the actual application writes completing, and if SNDR is running
asynchronously, these replicated writes to the SNDR secondary can be
additional seconds behind the actual application writes.
Unlike UFS filesystems and lockfs -f, or lockfs -w, there is no
'supported' way to get ZFS to empty the ZIL to disk on demand.
I'm wondering if you really meant ZIL here, or ARC?
It is my understanding that the ZFS intent log (ZIL) satisfies POSIX
requirements for synchronous transactions,
True.
thus filesystem consistency.
No. The filesystems in the pool are always consistent with or without
the ZIL. The ZIL is not the same as a journal (or the log in UFS).
The ZFS adaptive replacement cache (ARC) is where uncommitted filesystem
data is being cached. So although unwritten filesystem data allocated
from the DMU, retained in the ARC, it is the ZIL which influences
filesystem metadata and data consistency on disk.
No. It just ensures the synchronous requests (O_DSYNC, fsync() etc)
are on stable storage in case a crash/power fail occurs before
the dirty ARC is written when the txg commits.
In either case, creating a snapshot should get both flushed to disk, I
think?
No. A ZFS snapshot is a control path, verse data path operation and (to
the best of my understanding, and testing) has no influence over POSIX
filesystem consistency. See the discussion here:
http://www.opensolaris.org/jive/click.jspa?searchID=1695691&messageID=124809
Invoking a ZFS snapshot will assure the ZFS snapshot is consistent on
the replicated disk, but not all actively opened files.
A simple test I performed to verify this, was to append to a ZFS file
(no synchronous filesystem options being set) a series of blocks with a
block order pattern contained within. At some random point in this
process, I took a ZFS snapshot, immediately dropped SNDR into logging
mode. When importing the ZFS storage pool on the SNDR remote host, I
could see the ZFS snapshot just taken, but neither the snapshot version
of the file, or the file itself contained all of the data previously
written to it.
That seems like a bug in ZFS to me. A snapshot ought to contain all data
that has been written (whether synchronous or asynchronous) prior to the
snapshot.
I then retested, but opened the file with O_DSYNC, and when following
the same test steps above, both the snapshot version of the file, and
the file itself contained all of the data previously written to it.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss