Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

Richard Elling Mon, 03 Mar 2008 10:50:39 -0800

Bob Friesenhahn wrote:
> On Mon, 3 Mar 2008, Darren J Moffat wrote:
>
>   
>>> I'm not convinced that single bit flips are the common
>>> failure mode for disks.  Most enterprise class disks already
>>> have enough ECC to correct at least 8 bytes per block.
>>>       
>> and for consumer rather than enterprise  class disks ?
>>     
>
> You are assuming that the ECC used for "consumer" disks is 
> substantially different than that used for "enterprise" disks.  That 
> is likely not the case since ECC is provided by a chip which costs a 
> few dollars.  The only reason to use a lesser grade algorithm would be 
> to save a small bit of storage space.
>
> Consumer disks use essentially the same media as enterprise disks.
>
> Consumer disks store a higher bit density on similar media.
>
> Consumer disks have less precise/consistent head controllers than 
> enterprise disks.
>
> Consumer disks are less well-specified than enterprise disks.
>
> Due to the higher bit density we can expect more wrong bits to be read 
> since we are pushing the media harder.  Due to less consistent head 
> controllers we can expect more incidences of reading or writing the 
> wrong track or writing something which can't be read.  Consumer disks 
> are often used in an environment where they may be physically 
> disturbed while they are writing or reading the data.  Enterprise 
> disks are usually used in very stable environments.
>
> The upshot of this is that we can expect more unrecoverable errors, 
> but it seems unlikely that there will be more "single bit" errors 
> recoverable at the ZFS level.
>


I agree, and am waiting to get the proceedings from FAST08
which has some interesting papers in the list.

A while back I blogged about an Adaptec online seminar
which addressed this topic.  Rather than repeating what they
said, I left a pointer and a recommendation.
http://blogs.sun.com/relling/entry/adaptec_webinar_on_disks_and

Also, note that the published reliability data from disk vendors
is constantly changing.  For laptop drives, we're seeing less
MTBF or UER and more head landings specs.  It seems that
an important failure mode for laptop disks is wear out at the
landing site.  This is due to power management powering or
spinning down the disk.  We don't tend to see this failure
mode in servers or RAID arrays.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dealing with Single Bit Flips - WAS: Cause for data corruption?

Reply via email to