On Tue, Nov 28, 2006 at 03:02:59PM -0500, Elizabeth Schwartz wrote: > So I rebuilt my production mail server as Solaris 10 06/06 with zfs, it ran > for three months, and it's had no hardware errors. But my zfs file system > seems to have died a quiet death. Sun engineering response was to point to > the FMRI, which says to throw out the zfs partition and start over. I'm real > reluctant to do that, since it'll take hours to do a tape restore, and we > don't know what's wrong. I'm seriously wondering if I should just toss zfs. > Again, this is Solaris 10 06/06, not some beta version. It's an older > server, a 280R with an older SCSI RaidKing
So you have a one device pool and that device is a RAID device of some sort, and ZFS is getting errors from that device. From ZFS' point of view this is disastrous. From your point of view this shouldn't happen because your RAID device ought to save your bacon from single disk failures (depending on how it's configured). RAID devices aren't magical -- they can't detect certain kinds of errors that ZFS can. But ZFS can only recover from those errors -- provided it itself is in charge of the RAIDing and there are enough remaining good devices to reconstruct the correct data. (Then there's ditto blocks, but I don't recall the status of that, which would let you add some degree of redundancy without having to add devices.) The main reason for wanting to use hardware RAID with ZFS is to significantly reduce the amount of I/O that ZFS has to do (for a 5 disk RAID-5 we're talking about 5 times less I/O for the host to do), but because this means that ZFS can't do combinatorial reconstruction of bad disk data you want to then add mirroring (which also adds I/Os) so that ZFS can cope with bad data from the RAID device. How is your RAID device configured? Does it have any diagnostics? Or is your RAID device silently corrupting data? If so ZFS saved you by detecting that, but because you did not have enough redundancy (from ZFS' point of view) ZFS can't actually reconstruct the correct data, and so you lose. Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss