>>>>> "as" == Andras Spitzer <wsen...@gmail.com> writes:
as> So, you telling me that even if the SAN provides redundancy as> (HW RAID5 or RAID1), people still configure ZFS with either as> raidz or mirror? There's some experience that, in the case where the storage device or the FC mesh glitches or reboots while the ZFS host stays up across the reboot, you are less likely to lose the whole pool to ``ZFS-8000-72 The pool metadata is corrupted and cannot be opened. Destroy the pool and restore from backup.'' if you have ZFS-level redundancy than if you don't. Note that this ``corrupt and cannot be opened'' is a different problem from ``not being able to self-heal.'' When you need self-healing and don't have it, you usually shouldn't lose the whole pool. You should get a message in 'zpool status' telling you the name of a file that has unrecoverable errors. Any attempt to read the file returns an I/O error (not the marginal data). Then you have to go delete that file to clear the error, but otherwise the pool keeps working. In this self-heal case, if you'd had the ZFS-layer redundancy you'd get a count in the checksum column of one device and wouldn't have to delete the file, in fact you wouldn't even know the name of the file that got healed. some people have been trying to blame the ``corrupt and cannot be opened'' on bit-flips supposedly happening inside the storage or the FC cloud, the same kind of bit flip that causes the other self-healable problem, but I don't buy it. I think it's probably cache sync / write barrier problems that's killing the unredundant pools on SAN's.
pgpYVvSe908RY.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss