On Wed, 31 Jan 2007 [EMAIL PROTECTED] wrote:

>
> >I understand all the math involved with RAID 5/6 and failure rates,
> >but its wise to remember that even if the probabilities are small
> >they aren't zero. :)

Agreed.  Another thing I've seen, is that if you have an A/C (Air
Conditioning) "event" in the data center or lab, you will usually see a
cluster of failures over the next 2 to 3 weeks.  Effectively, all your
disk drives have been thermally stressed and are likely to exhibit a spike
in the failure rates in the near term.

Often, in a larger environment, the facilities personnel don't understand
the co-relation between an A/C event and disk drive failure rates.  And
major A/C upgrade work is often scheduled over a (long) weekend when most
of the technical talent won't be present.  After the work is completed
everyone is told that it "went very well" because the organization does
not "do bad news" and then you loose two drives in a RAID5 array ....

> And after 3-5 years of continuous operation, you better decommission the
> whole thing or you will have many disk failures.

Agreed.  We took an 11 disk FC hardware RAID box offline recently because
all the drives were 5 years old.  It's tough to hit those power off
switches and scrap working disk drives, but much better than the business
disruption and professional embarassment caused by data loss.  And much
better to be in control of, and experience, *scheduled* downtime.  BTW:
don't forget that if you plan to continue to use the disk enclosure
hardware you need to replace _all_ the fans first.

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to