On Tue, Nov 28, 2006 at 03:02:59PM -0500, Elizabeth Schwartz wrote:
> So I rebuilt my production mail server as Solaris 10 06/06 with zfs, it ran
> for three months, and it's had no hardware errors. But my zfs file system
> seems to have died a quiet death. Sun engineering response was to point to
> the FMRI, which says to throw out the zfs partition and start over. I'm real
> reluctant to do that, since it'll take hours to do a tape restore, and we
> don't know what's wrong.  I'm seriously wondering if I should just toss zfs.
> Again, this is Solaris 10 06/06, not some beta version. It's an older
> server, a 280R with an older SCSI RaidKing

So you have a one device pool and that device is a RAID device of some
sort, and ZFS is getting errors from that device.  From ZFS' point of
view this is disastrous.  From your point of view this shouldn't happen
because your RAID device ought to save your bacon from single disk
failures (depending on how it's configured).

RAID devices aren't magical -- they can't detect certain kinds of errors
that ZFS can.  But ZFS can only recover from those errors -- provided it
itself is in charge of the RAIDing and there are enough remaining good
devices to reconstruct the correct data.  (Then there's ditto blocks,
but I don't recall the status of that, which would let you add some
degree of redundancy without having to add devices.)

The main reason for wanting to use hardware RAID with ZFS is to
significantly reduce the amount of I/O that ZFS has to do (for a 5 disk
RAID-5 we're talking about 5 times less I/O for the host to do), but
because this means that ZFS can't do combinatorial reconstruction of bad
disk data you want to then add mirroring (which also adds I/Os) so that
ZFS can cope with bad data from the RAID device.

How is your RAID device configured?  Does it have any diagnostics?

Or is your RAID device silently corrupting data?  If so ZFS saved you by
detecting that, but because you did not have enough redundancy (from
ZFS' point of view) ZFS can't actually reconstruct the correct data, and
so you lose.

Nico
-- 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to