On 03/06/09 08:10, Jim Dunham wrote:
Andrew,

Jim Dunham wrote:
ZFS the filesystem is always on disk consistent, and ZFS does maintain filesystem consistency through coordination between the ZPL (ZFS POSIX Layer) and the ZIL (ZFS Intent Log). Unfortunately for SNDR, ZFS caches a lot of an applications filesystem data in the ZIL, therefore the data is in memory, not written to disk, so SNDR does not know this data exists. ZIL flushes to disk can be seconds behind the actual application writes completing, and if SNDR is running asynchronously, these replicated writes to the SNDR secondary can be additional seconds behind the actual application writes.

Unlike UFS filesystems and lockfs -f, or lockfs -w, there is no 'supported' way to get ZFS to empty the ZIL to disk on demand.

I'm wondering if you really meant ZIL here, or ARC?

It is my understanding that the ZFS intent log (ZIL) satisfies POSIX requirements for synchronous transactions,

True.

thus filesystem consistency.

No. The filesystems in the pool are always consistent with or without
the ZIL.  The ZIL is not the same as a journal (or the log in UFS).

The ZFS adaptive replacement cache (ARC) is where uncommitted filesystem data is being cached. So although unwritten filesystem data allocated from the DMU, retained in the ARC, it is the ZIL which influences filesystem metadata and data consistency on disk.

No. It just ensures the synchronous requests (O_DSYNC, fsync() etc)
are on stable storage in case a crash/power fail occurs before
the dirty ARC is written when the txg commits.


In either case, creating a snapshot should get both flushed to disk, I think?

No. A ZFS snapshot is a control path, verse data path operation and (to the best of my understanding, and testing) has no influence over POSIX filesystem consistency. See the discussion here: http://www.opensolaris.org/jive/click.jspa?searchID=1695691&messageID=124809

Invoking a ZFS snapshot will assure the ZFS snapshot is consistent on the replicated disk, but not all actively opened files.

A simple test I performed to verify this, was to append to a ZFS file (no synchronous filesystem options being set) a series of blocks with a block order pattern contained within. At some random point in this process, I took a ZFS snapshot, immediately dropped SNDR into logging mode. When importing the ZFS storage pool on the SNDR remote host, I could see the ZFS snapshot just taken, but neither the snapshot version of the file, or the file itself contained all of the data previously written to it.

That seems like a bug in ZFS to me. A snapshot ought to contain all data
that has been written (whether synchronous or asynchronous) prior to the 
snapshot.


I then retested, but opened the file with O_DSYNC, and when following the same test steps above, both the snapshot version of the file, and the file itself contained all of the data previously written to it.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to