Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

Ulrich Graef Fri, 02 Jan 2009 05:52:41 -0800

Hi Carsten,

Carsten Aulbert wrote:
> Hi Marc,
> 
> Marc Bevand wrote:
>> Carsten Aulbert <carsten.aulbert <at> aei.mpg.de> writes:
>>> In RAID6 you have redundant parity, thus the controller can find out
>>> if the parity was correct or not. At least I think that to be true
>>> for Areca controllers :)
>> Are you sure about that ? The latest research I know of [1] says that 
>> although an algorithm does exist to theoretically recover from
>> single-disk corruption in the case of RAID-6, it is *not* possible to
>> detect dual-disk corruption with 100% certainty. And blindly running
>> the said algorithm in such a case would even introduce corruption on a
>> third disk.
>>
> 
> Well, I probably need to wade through the paper (and recall Galois field
> theory) before answering this. We did a few tests in a 16 disk RAID6
> where we wrote data to the RAID, powered the system down, pulled out one
> disk, inserted it into another computer and changed the sector checksum
> of a few sectors (using hdparm's utility makebadsector).
 > ...


You need not to wade through your paper...
ECC theory tells, that you need a minimum distance of 3
to correct one error in a codeword, ergo neither RAID-5 or RAID-6
are enough: you need RAID-2 (which nobody uses today).

Raid-Controllers today take advantage of the fact that they know,
which disk is returning the bad block, because this disk returns
a read error.

ZFS is even able to correct, when an error in the data exist,
but no disk is reporting a read error,
because ZFS ensures the integrity from root-block to the data blocks
with a long checksum accompanying the block pointers.

A disk can deliver bad data without returning a read error by
  - misdirected read (bad positioning of disk head before reading)
  - previously misdirected write (on writing this sector)
  - unfourtunate sector error (data wrong, but checksum is ok)

These events can happen and are documented on disk vendors web pages:

  a) A bad head positioning is estimated per 10^8 to 10^9 head moves.
     => this is more than once in 8 weeks on a fully loaded disk

  b) Unrecoverable data error (bad data on disk)
     is around one sector per 10^16 Bytes read.
     => one unrecoverable error per 177 TByte read.

OK, these numbers seem pretty good, but when you have 1000 disks
in your datacenter, you will have at least one of these errors
each day...

Therfore: Use ZFS in a redundant configuration!

Regards,

        Ulrich

-- 
| Ulrich Graef, Senior System Engineer, OS Ambassador        \
|  Operating Systems, Performance \ Platform Technology       \
|   Mail: ulrich.gr...@sun.com     \ Global Systems Enginering \
|    Phone: +49 6103 752 359        \ Sun Microsystems Inc      \

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
                        D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

Reply via email to