On Tue, 18 Jul 2006, Al Hopper wrote:

On Tue, 18 Jul 2006, Daniel Rock wrote:

Richard Elling schrieb:
Jeff Bonwick wrote:
For 6 disks, 3x2-way RAID-1+0 offers better resiliency than RAID-Z
or RAID-Z2.

Maybe I'm missing something, but it ought to be the other way around.
With 6 disks, RAID-Z2 can tolerate any two disk failures, whereas
for 3x2-way mirroring, of the (6 choose 2) = 6*5/2 = 15 possible
two-disk failure scenarios, three of them are fatal.

For the 6-disk case, with RAID-1+0 you get 27/64 surviving states
versus 22/64 for RAID-Z2.  This accounts for the cases where you could
lose 3 disks and survive with RAID-1+0.

I think this type of calculation is flawed. Disk failures are rare and
multiple disk failures at the same time are even more rare.

Stop right here! :)  If you have a large number of identical disks which
operate in the same environment[1], and possibly the same enclosure, it's
quite likely that you'll see 2 or more disks die within the same,
relatively short, timeframe.

Also, with todays higher density disk enclosures, a fan failure, which
goes un-noticed for a period of time, is likely to affect more than one
drive - again leading to multiple disks failing in the same general
timeframe.

This is also why I advocate having cold spares available - so that the
probability of the spare failing within the same timeframe is greatly
diminished.

A good SMART implementation combined with a decent sensor framework can also be useful for dealing with these conditions. Smartmontools is currently able to send E-amil when the ambient temperature of a disk drive goes beyond the recommended thresholds. I am hopeful the Solaris SMART implementation will take temperature into account, since modern disk drives run hot, and fan failures aren't all that uncommon.

- Ryan
--
UNIX Administrator
http://prefetch.net

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to