Re: [zfs-discuss] Split responsibility for data with ZFS

Miles Nordin Wed, 10 Dec 2008 13:38:02 -0800

>>>>> "nw" == Nicolas Williams <[EMAIL PROTECTED]> writes:
>>>>> "wm" == Will Murnane <[EMAIL PROTECTED]> writes:


    nw> ZFS has very strong error detection built-in,

    nw> ZFS can also store multiple copies of data and metadata even
    nw> in non-mirrored/non-RAID-Z pools.

    nw> Whoever is making those objections is misinformed.

The objection, to review, is that people are losing entire ZFS pools
on SAN's more often than UFS pools on the same SAN.  This is
experience.  One might start trying to infer the reason, from the
manual recovery workarounds that have worked: using an older
ueberblock.

    wm> Turning off checksumming on the ZFS side may ``solve'' the
    wm> problem.

That wasn't the successful answer for people who lost pools and then
recovered them.  Based on my limited understanding I don't think it
would help a pool that was recovered by using an older ueberblock.

Also to pick a nit, AIUI certain checksums on the metadata can't be
disabled because they're used in place of write-barriered commit
sectors. I might be wrong though.

    nw> ZFS always leaves the filesystem in a consistent state,
    nw> provided the drives aren't lying.

ZFS needs to give similar reliability performance to competing
filesystems while running on the drives and SANs that exist now.

Alternatively, if you want to draw a line in the sand on the ``blame
the device'' position, the problems causing lost pools have to be
actually tracked down and definitively blamed on misimplemented
devices, and we need to develop a procedure to identify and disqualify
the misimplemented devices.  When we follow the qualification
procedure before loading data into the pool, you're no longer allowed
to blame devices with hindsight after the pool's lost by pointing at
self-exhonerating error messages or telling stories about theoretical
capabilities of the on-disk format.  We also need to develop a list of
broken devices so we can avoid buying them, and the list needs not to
be a secret list rumored to contain ``drives from major vendors'' for
fear of vendors retaliating by repealing discounts or whatever.  I
kind of prefer this approach, but the sloppier approach of working
around the problem (``working around'' meaing automatically, safely,
somewhat-quickly, hopefully not silently, recovering from often-seen
kinds of corruption without rigorously identifying their root causes,
just like fsck does on other filesystems) is probably easier to
implement.

Other filesystems like ext3 and XFS on Linux have gone through the
same process of figuring out why corruption was happening and working
around it through changing the way they write, sending drives
STOP_UNIT commands before ACPI powerdown, the rumored ``about to lose
power'' interrupt on SGI that makes Irix cancel DMA, and mostly adding
special cases to fsck, and so on.

I think the obstructionist recitations of on-disk-format feature lists
explaining why this ``shouldn't be happening'' reduce confidence in
ZFS.  They don't improve it.

pgpkgtA4ANIV2.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Split responsibility for data with ZFS

Reply via email to