> On Sat, 14 Jun 2008, zfsmonk wrote:
>
> > Mentioned on
> >
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
>
> > is the following: "ZFS works well with storage based protected LUNs
>
> > (RAID-5 or mirrored LUNs from intelligent storage arrays). However,
>
> > ZFS cannot heal corrupted blocks that are detected by ZFS
> > checksums."
>
> This basically means that the checksum itself is not sufficient to
> accomplish correction. However if ZFS-level RAID is used, the correct
>
> block can be obtained from a redundant copy.
>
> > based upon that, if we have LUNs already in RAID5 being served from
>
> > intelligent storage arrays, is it any benefit to create the zpool in
>
> > a mirror if zfs can't heal any corrupted blocks? Or would we just be
>
> > wasting disk space?
>
> This is a matter of opinion. If ZFS does not have access to
> redundancy then it can not correct any problems that it encounters,
> and could even panic the system or the entire pool could be lost.
> However, if the storage array and all associated drivers, adaptors,
> memory, and links are working correctly, then this risk may be
> acceptable (to you).
>
> ZFS experts at Sun say that even the best storage arrays may not
> detect and correct some problems and that complex systems can produce
>
> errors even though all of their components seem to be working
> correctly. This is in spite of Sun also making a living by selling
> such products. The storage array is only able to correct errors it
> detects due to the hardware reporting an unrecoverable error condition
>
> or by double-checking using data on a different drive. Since storage
>
> arrays want to be fast they are likely to engage additional validity
> checks/correction only after a problem has already been reported (or
> during a scrub/resilver) rather than as a matter of course.
>
> A problem which may occur is that your storage array may say that the
>
> data is good while ZFS says that there is bad data. Under these
> conditions there might not be a reasonable way to correct the problem
>
> other than to lose the data. If the zfs pool requires the failed data
>
> in order to operate, then the entire pool could be lost.
>
Couple of questions on this topic -
What's the percent of data in a zpool that if it gets one of these bit
corruption errors, will actually cause the zpool to fail? Is it a higher/lower
percent than what it would take to fatally and irrevocably corrupt UFS, or VxFS
to the point where a restore is required?
Given that today's storage arrays catch a good percentage of errors and correct
them (for the intelligent arrays I have in mind anyway), is we're talking about
the nasty, silent corruption I've been reading about that occurs in huge
datasets where the RAID thinks it's good, but it's actually garbage? From what
I remember reading, that's an low occurrence rate and only became noticeable
because we're dealing in such large amounts of data these days. Am I wrong
here?
So, looking at making operational decisions in the short term, I have to ask
specifically. Is it more or less likely that a zpool will die and have to be
restored than UFS or VxFS filesystems on a VxVM volume?
My opinions and questions are my own, and do not necessarily represent those of
my employer. (or my coworkers, or anyone else)
cheers,
Brian
> Bob
> ======================================
> Bob Friesenhahn
> [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss