Re: [zfs-discuss] raidz data loss stories?

Al Hopper Sun, 27 Dec 2009 08:45:57 -0800

I know I'm a bit late to contribute to this thread, but I'd still like to
add my $0.02.  My "gut feel" is that we (generally) don't yet understand the
subtleties of disk drive failure modes as they relate to 1.5 or 2Tb+ drives.
 Why?  Because those large drives have not been widely available until
relatively recently.


There's a tendency to extrapolate ones existing knowledge base and
understanding of how/why drives fail (or degrade) by basing our expected
outcome on some "extension" of our existing knowledge base.  In the case of
the current generation of high capacity drives, that may or may not be
appropriate.  We simply don't know!  Mainly because the hard drive
manufacturers, those engineering gods and providers of ever increasing
storage density, don't communicate their acquired and evolving knowledge as
it relates to disk reliability (or failure) mechanisms.

In this case I feel, as a user, it's best to take a very conservative
approach and err on the side of safety by using raidz3 when high capacity
drives are being deployed.  Over time, a consensus based understanding of
the failure modes will emerge and then, from a user perspective, we can have
a clearer understanding of the risks of data loss and its relation to
different ZFS pool configurations.

Personally, I was surprised at how easily I was able to "take out" a 1Tb WD
Caviar black drive by moving a 1U server with the drives spinning.  Earlier
drive generations (500Gb or smaller) tolerated this abuse with no signs of
degradation.  So I know that high capacity drives are a lot more sensitive
to mechanical "abuse" - I can only assume that 2Tb drives are probably even
more sensitive and that shock mounting, to reduce vibration induced by a
bunch of similar drives operating in the same "box", is probably a smart
move.

Likewise, my previous experience has seen how a given percentage of disk
drives would fail in the 2 or 3 week period following a temperature
"excursion" in a data center environment.  Sometimes everyone knows about
that event, and sometimes the folks doing A/C work over a holiday weekend
will "forget" to publish the details of what went wrong! :)   Again - the
same doubts continue to nag me: are the current 1.5Tb+ drives more likely to
suffer degradation due to a temperature excursion over a relatively small
time period?  If the drive firmware does its job and remaps damaged sectors
or tracks transparently, we, as the users, won't know - until it happens one
time too many!!

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX a...@logical-approach.com
                  Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] raidz data loss stories?

Reply via email to