On May 6, 2010, at 8:34 AM, Edward Ned Harvey <solar...@nedharvey.com>
wrote:
From: Pasi Kärkkäinen [mailto:pa...@iki.fi]
In neither case do you have data or filesystem corruption.
ZFS probably is still OK, since it's designed to handle this (?),
but the data can't be OK if you lose 30 secs of writes.. 30 secs of
writes
that have been ack'd being done to the servers/applications..
What I meant was: Yes there's data loss. But no corruption. In
other
filesystems, if you have an ungraceful shutdown while the filesystem
is
writing, since filesystems such as EXT3 perform file-based (or inode-
based)
block write operations, then you can have files whose contents have
been
corrupted... Some sectors of the file still in their "old" state,
and some
sectors of the file in their "new" state. Likewise, in something
like EXT3,
you could have some file fully written, while another one hasn't been
written yet, but should have been. (AKA, some files written out of
order.)
In the case of EXT3, since it is a journaled filesystem, the journal
only
keeps the *filesystem* consistent after a crash. It's still
possible to
have corrupted data in the middle of a file.
I believe ext3 has an option to journal data as well as metadata, it
just defaults to metadata.
I don't believe out-of-order writes are so much an issue any more
since Linux gained write barrier support (and most file systems and
block devices now support it).
These things don't happen in ZFS. ZFS takes journaling to a whole new
level. Instead of just keeping your filesystem consistent, it also
keeps
your data consistent. Yes, data loss is possible when a system
crashes, but
the filesystem will never have any corruption. These are separate
things
now, and never were before.
ZFS does NOT have a journal, it has an intent log which is completely
different. A journal logs operations that are to be performed later
(the journal is read, the operation performed) an intent log logs
operations that are being performed now, when the disk flushes the
intent entry is marked complete.
ZFS is consistent by the nature of COW which means a partial write
will not become part of the file system (the old block pointer isn't
updated till the new block completes the write).
In ZFS, losing n-seconds of writes leading up to the crash will
never result
in files partially written, or written out of order. Every atomic
write to
the filesystem results in a filesystem-consistent and data-
consistent view
of *some* valid form of all the filesystem and data within it.
ZFS file system will always be consistent, but if an application
doesn't flush it's data, then it can definitely have partially written
data.
-Ross
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss