Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

Frank Middleton Sun, 26 Jul 2009 08:09:32 -0700

On 07/25/09 04:30 PM, Carson Gaspar wrote:

No. You'll lose unwritten data, but won't corrupt the pool, because
the on-disk state will be sane, as long as your iSCSI stack doesn't
lie about data commits or ignore cache flush commands. Why is this so
difficult for people to understand? Let me create a simple example
for you.


Are you sure about this example? AFAIK metadata refers to things like
the file's name, atime, ACLs, etc., etc. Your example seems to be more
about how a journal works, which has little to do with metatdata other
than to manage it.

Now if you were too lazy to bother to follow the instructions properly,
we could end up with bizarre things. This is what happens when storage
lies and re-orders writes across boundaries.


On 07/25/09 07:34 PM, Toby Thain wrote:

The problem is assumed *ordering*. In this respect VB ignoring flushes
and real hardware are not going to behave the same.


Why? An ignored flush is ignored. It may be more likely in VB, but it
can always happen. It mystifies me that VB would in some way alter
the ordering. I wonder if the OP could tell us what actual disks and
controller he used to see if the hardware might actually have done
out-of-order writes despite the fact that ZFS already does write
optimization. Maybe the disk didn't like the physical location of
the log relative to the data so it wrote the data first? Even then
it isn't onvious why this would cause the pool to be lost.

A traditional journalling file system should survive the loss pf a flush.
Either the log entry was written or it wasn't. Even if the disk, for
some bizarre reason, writes some of the actual data before writing the
log, the repair process should undo that,

If written properly, it will use the information in the most current
complete journal entry to repair the file system. Doing synchs are
devastating to performance so usually there's an option to disable
them, at the known risk of losing a lot more data. I've been using
SPARCs and Solaris from the beginning. Ever since UFS supported
journalling, I've never lost a file unless the disk went totally bad,
and none since mirroring. Didn't miss fsck either :-)

Doesn't ZIL effectively make ZFS into a journalled file system (in
another thread, Bob Friesenhahn says it isn't, but I would submit
that the general opinion is correct that it is; "log" and "journal"
have similar semantics). The evil tuning guide is pretty emphatic
about not disabling it!

My intuition (and this is entirely speculative) is that the ZFS ZIL
either doesn't contain everything needed to restore the superstructure,
or that if it does, the recovery process is ignoring it. I think I read
that the ZIL is per-file system, but one hopes it doesn't rely on the
superstructure recursively, or this would be impossible to fix (maybe
there's a ZIL for the ZILs :) ).

On 07/21/09 11:53 AM, George Wilson wrote:

We are working on the pool rollback mechanism and hope to have that
soon. The ZFS team recognizes that not all hardware is created equal and
thus the need for this mechanism. We are using the following CR as the
tracker for this work:

6667683 need a way to rollback to an uberblock from a previous txg


so maybe this discussion is moot :-)

-- Frank
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

Reply via email to