kyusun Chang wrote:
Does ZFS recover all file system transactions which
it returned with success
since the last commit of TxG, which implis that ZIL
must flush log records for
each successful file system transaction before it
returns to caller so that
t can replay
the filesystem transactions?
Only synchronous transactions (those forced by
O_DSYNC or fsync()) are
written to the intent log.
Could you help me to clarify on "writing 'synchronous' transactions
to the log"?
Assume a scenario where a sequence of new subdirectories
D1, D2 (as child of D1) have been created, then new files
F1 in D1 and F2 in D2 have been created, and after some writes
to F1 and F2, fsync(F1) was issued.
Also , assume a file F3 in other parts of the file system
that are being modified.
To recover F1, creation of D1 and D2 must be recovered.
It would be painful to find and log the relevant information
at the time of fsync() to recover them.
The ZIL will write log records to stable storage for all directory creations
and the
data for F1, but not the data for F2, F3.. See the code in zil_commit_writer()
for the exact details:
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/zil.c#938
It means that
1) ZFS needs to log EVERY (vs "synchronous") file system transactions to replay
(i.e., redo onto the on-disk state of last commit of TxG)
since one cannot predict when fsync() would be requested for
which file, i.e., ZFS log them all in-memory, but flushes only at
synchronous transaction?
It also means ZFS log user data for for every write()?
2) If the cumulated log records up to fsync(F1) (from
last fsync()) is flushed to disk for replay at subsequent
recovery,
ZFS recovers the consistent file system state at the point
in time of latest fsync(), including all successful file
system transaction up to that point that have nothing to do
with F1, e.g., F3, before crash"?
Or, am I missing something?
So the actual code logs everything except, writes, setattr, acls, and truncates
for other files. This has undergone some change over time and may continue to
change.
I presume that flush of log occurs also at every write() of
file opened with O_DSYNC. Otherwise, it should be same as
fsync() case.
Correct
Are there any other synchronization request that forces
in-memory log?
There are others: O_RSYNC, O_SYNC, sync(1M)
As a side question, does ZFS log atime update (and
does snapshot copy-on-write for it)?
I don't think atime updates are logged as transactions.
Not sure about snapshots COW-ing.
Again, thank you for your time.
So what are your concerns here? Correctness or performance?
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss