On May 6, 2010, at 8:34 AM, Edward Ned Harvey <solar...@nedharvey.com> wrote:

From: Pasi Kärkkäinen [mailto:pa...@iki.fi]

In neither case do you have data or filesystem corruption.


ZFS probably is still OK, since it's designed to handle this (?),
but the data can't be OK if you lose 30 secs of writes.. 30 secs of
writes
that have been ack'd being done to the servers/applications..

What I meant was: Yes there's data loss. But no corruption. In other filesystems, if you have an ungraceful shutdown while the filesystem is writing, since filesystems such as EXT3 perform file-based (or inode- based) block write operations, then you can have files whose contents have been corrupted... Some sectors of the file still in their "old" state, and some sectors of the file in their "new" state. Likewise, in something like EXT3,
you could have some file fully written, while another one hasn't been
written yet, but should have been. (AKA, some files written out of order.)

In the case of EXT3, since it is a journaled filesystem, the journal only keeps the *filesystem* consistent after a crash. It's still possible to
have corrupted data in the middle of a file.

I believe ext3 has an option to journal data as well as metadata, it just defaults to metadata.

I don't believe out-of-order writes are so much an issue any more since Linux gained write barrier support (and most file systems and block devices now support it).

These things don't happen in ZFS.  ZFS takes journaling to a whole new
level. Instead of just keeping your filesystem consistent, it also keeps your data consistent. Yes, data loss is possible when a system crashes, but the filesystem will never have any corruption. These are separate things
now, and never were before.

ZFS does NOT have a journal, it has an intent log which is completely different. A journal logs operations that are to be performed later (the journal is read, the operation performed) an intent log logs operations that are being performed now, when the disk flushes the intent entry is marked complete.

ZFS is consistent by the nature of COW which means a partial write will not become part of the file system (the old block pointer isn't updated till the new block completes the write).

In ZFS, losing n-seconds of writes leading up to the crash will never result in files partially written, or written out of order. Every atomic write to the filesystem results in a filesystem-consistent and data- consistent view
of *some* valid form of all the filesystem and data within it.

ZFS file system will always be consistent, but if an application doesn't flush it's data, then it can definitely have partially written data.

-Ross
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to