On Wed, 31 Jan 2007 [EMAIL PROTECTED] wrote: > > >I understand all the math involved with RAID 5/6 and failure rates, > >but its wise to remember that even if the probabilities are small > >they aren't zero. :)
Agreed. Another thing I've seen, is that if you have an A/C (Air Conditioning) "event" in the data center or lab, you will usually see a cluster of failures over the next 2 to 3 weeks. Effectively, all your disk drives have been thermally stressed and are likely to exhibit a spike in the failure rates in the near term. Often, in a larger environment, the facilities personnel don't understand the co-relation between an A/C event and disk drive failure rates. And major A/C upgrade work is often scheduled over a (long) weekend when most of the technical talent won't be present. After the work is completed everyone is told that it "went very well" because the organization does not "do bad news" and then you loose two drives in a RAID5 array .... > And after 3-5 years of continuous operation, you better decommission the > whole thing or you will have many disk failures. Agreed. We took an 11 disk FC hardware RAID box offline recently because all the drives were 5 years old. It's tough to hit those power off switches and scrap working disk drives, but much better than the business disruption and professional embarassment caused by data loss. And much better to be in control of, and experience, *scheduled* downtime. BTW: don't forget that if you plan to continue to use the disk enclosure hardware you need to replace _all_ the fans first. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss