Hi Marc,

Marc Bevand wrote:
> Carsten Aulbert <carsten.aulbert <at> aei.mpg.de> writes:
>> In RAID6 you have redundant parity, thus the controller can find out
>> if the parity was correct or not. At least I think that to be true
>> for Areca controllers :)
> 
> Are you sure about that ? The latest research I know of [1] says that 
> although an algorithm does exist to theoretically recover from
> single-disk corruption in the case of RAID-6, it is *not* possible to
> detect dual-disk corruption with 100% certainty. And blindly running
> the said algorithm in such a case would even introduce corruption on a
> third disk.
>

Well, I probably need to wade through the paper (and recall Galois field
theory) before answering this. We did a few tests in a 16 disk RAID6
where we wrote data to the RAID, powered the system down, pulled out one
disk, inserted it into another computer and changed the sector checksum
of a few sectors (using hdparm's utility makebadsector). The we
reinserted this into the original box, powered it up and ran a volume
check and the controller did indeed find the corrupted sector and
repaired the correct one without destroying data on another disk (as far
as we know and tested).

For the other point: dual-disk corruption can (to my understanding)
never be healed by the controller since there is no redundant
information available to check against. I don't recall if we performed
some tests on that part as well, but maybe we should do that to learn
how the controller will behave. As a matter of fact at that point it
should just start crying out loud and tell me, that it cannot recover
for that. But the chance of this happening should be relatively small
unless the backplane/controller had a bad hiccup when writing that stripe.

> This is the reason why, AFAIK, no RAID-6 implementation actually
> attempts to recover from single-disk corruption (someone correct me if
> I am wrong).
> 

As I said I know that our Areca 1261ML does detect and correct those
errors - if these are single-disk corruptions

> The exception is ZFS of course, but it accomplishes single and
> dual-disk corruption self-healing by using its own checksum, which is
> one layer above RAID-6 (therefore unrelated to it).

Yes, very helpful and definitely desirable to have :)
> 
> [1] http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf

Thanks for the pointer

Cheers

Carsten
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to