On 07/25/09 04:30 PM, Carson Gaspar wrote:
No. You'll lose unwritten data, but won't corrupt the pool, because the on-disk state will be sane, as long as your iSCSI stack doesn't lie about data commits or ignore cache flush commands. Why is this so difficult for people to understand? Let me create a simple example for you.
Are you sure about this example? AFAIK metadata refers to things like the file's name, atime, ACLs, etc., etc. Your example seems to be more about how a journal works, which has little to do with metatdata other than to manage it.
Now if you were too lazy to bother to follow the instructions properly, we could end up with bizarre things. This is what happens when storage lies and re-orders writes across boundaries.
On 07/25/09 07:34 PM, Toby Thain wrote:
The problem is assumed *ordering*. In this respect VB ignoring flushes and real hardware are not going to behave the same.
Why? An ignored flush is ignored. It may be more likely in VB, but it can always happen. It mystifies me that VB would in some way alter the ordering. I wonder if the OP could tell us what actual disks and controller he used to see if the hardware might actually have done out-of-order writes despite the fact that ZFS already does write optimization. Maybe the disk didn't like the physical location of the log relative to the data so it wrote the data first? Even then it isn't onvious why this would cause the pool to be lost. A traditional journalling file system should survive the loss pf a flush. Either the log entry was written or it wasn't. Even if the disk, for some bizarre reason, writes some of the actual data before writing the log, the repair process should undo that, If written properly, it will use the information in the most current complete journal entry to repair the file system. Doing synchs are devastating to performance so usually there's an option to disable them, at the known risk of losing a lot more data. I've been using SPARCs and Solaris from the beginning. Ever since UFS supported journalling, I've never lost a file unless the disk went totally bad, and none since mirroring. Didn't miss fsck either :-) Doesn't ZIL effectively make ZFS into a journalled file system (in another thread, Bob Friesenhahn says it isn't, but I would submit that the general opinion is correct that it is; "log" and "journal" have similar semantics). The evil tuning guide is pretty emphatic about not disabling it! My intuition (and this is entirely speculative) is that the ZFS ZIL either doesn't contain everything needed to restore the superstructure, or that if it does, the recovery process is ignoring it. I think I read that the ZIL is per-file system, but one hopes it doesn't rely on the superstructure recursively, or this would be impossible to fix (maybe there's a ZIL for the ZILs :) ). On 07/21/09 11:53 AM, George Wilson wrote:
We are working on the pool rollback mechanism and hope to have that soon. The ZFS team recognizes that not all hardware is created equal and thus the need for this mechanism. We are using the following CR as the tracker for this work: 6667683 need a way to rollback to an uberblock from a previous txg
so maybe this discussion is moot :-) -- Frank _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss