I'd like to correct a few misconceptions about the ZIL here.
On 03/06/09 06:01, Jim Dunham wrote:
ZFS the filesystem is always on disk consistent, and ZFS does maintain
filesystem consistency through coordination between the ZPL (ZFS POSIX
Layer) and the ZIL (ZFS Intent Log).
Pool and file system consistency is more a function of the DMU & SPA.
Unfortunately for SNDR, ZFS caches
a lot of an applications filesystem data in the ZIL, therefore the data
is in memory, not written to disk,
ZFS data is actually cached in the ARC. The ZIL code keeps in-memory records
of system call transactions in case a fsync() occurs.
so SNDR does not know this data
exists. ZIL flushes to disk can be seconds behind the actual application
writes completing,
It's the DMU/SPA that handles the transaction group commits (not the ZIL).
Currently these occur 30 seconds or more frequently on a loaded system.
and if SNDR is running asynchronously, these
replicated writes to the SNDR secondary can be additional seconds behind
the actual application writes.
Unlike UFS filesystems and lockfs -f, or lockfs -w, there is no
'supported' way to get ZFS to empty the ZIL to disk on demand.
The sync(2) system call is implemented differently in ZFS.
For UFS it initiates a flush of cached data to disk, but does
not wait for completion. This satisfies the POSIX requirement but
never seemed right. For ZFS we wait for all transactions
to complete and commit to stable storage (including flushing any
disk write caches) before returning. So any asynchronous data
in the ARC is written.
Alternatively, a lockfs will flush just a file system to stable storage
but in this case just the intent log is written. (Then later when
the txg commits those intent log records are discarded).
For some basic info on the ZIL see:
http://blogs.sun.com/perrin/entry/the_lumberjack
Neil.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss