Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Bill Sommerfeld
On 03/19/10 19:07, zfs ml wrote: What are peoples' experiences with multiple drive failures? 1985-1986. DEC RA81 disks. Bad glue that degraded at the disk's operating temperature. Head crashes. No more need be said. - Bill __

Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Svein Skogen
On 21.03.2010 00:14, Erik Trimble wrote: Richard Elling wrote: I see this on occasion. However, the cause is rarely attributed to a bad batch of drives. More common is power supplies, HBA firmware, cables, Pepsi syndrome, or similar. -- richard Mmmm. Pepsi Syndrome. I take it this is similar to

Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Erik Trimble
Richard Elling wrote: I see this on occasion. However, the cause is rarely attributed to a bad batch of drives. More common is power supplies, HBA firmware, cables, Pepsi syndrome, or similar. -- richard Mmmm. Pepsi Syndrome. I take it this is similar to the Coke addiction many of my keyboa

Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Bob Friesenhahn
On Fri, 19 Mar 2010, zfs ml wrote: same enclosure, same rack, etc for a given raid 5/6/z1/z2/z3 system, should we be paying more attention to harmonics, vibration/isolation and non-intuitive system level statistics that might be inducing close proximity drive failures rather than just throwing

Re: [zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-20 Thread Richard Elling
On Mar 19, 2010, at 7:07 PM, zfs ml wrote: > Most discussions I have seen about RAID 5/6 and why it stops "working" seem > to base their conclusions solely on single drive characteristics and > statistics. > It seems to me there is a missing component in the discussion of drive > failures in the

[zfs-discuss] sympathetic (or just multiple) drive failures

2010-03-19 Thread zfs ml
Most discussions I have seen about RAID 5/6 and why it stops "working" seem to base their conclusions solely on single drive characteristics and statistics. It seems to me there is a missing component in the discussion of drive failures in the real world context of a system that lives in an envir