Hi Marc, Marc Bevand wrote: > Carsten Aulbert <carsten.aulbert <at> aei.mpg.de> writes: >> In RAID6 you have redundant parity, thus the controller can find out >> if the parity was correct or not. At least I think that to be true >> for Areca controllers :) > > Are you sure about that ? The latest research I know of [1] says that > although an algorithm does exist to theoretically recover from > single-disk corruption in the case of RAID-6, it is *not* possible to > detect dual-disk corruption with 100% certainty. And blindly running > the said algorithm in such a case would even introduce corruption on a > third disk. >
Well, I probably need to wade through the paper (and recall Galois field theory) before answering this. We did a few tests in a 16 disk RAID6 where we wrote data to the RAID, powered the system down, pulled out one disk, inserted it into another computer and changed the sector checksum of a few sectors (using hdparm's utility makebadsector). The we reinserted this into the original box, powered it up and ran a volume check and the controller did indeed find the corrupted sector and repaired the correct one without destroying data on another disk (as far as we know and tested). For the other point: dual-disk corruption can (to my understanding) never be healed by the controller since there is no redundant information available to check against. I don't recall if we performed some tests on that part as well, but maybe we should do that to learn how the controller will behave. As a matter of fact at that point it should just start crying out loud and tell me, that it cannot recover for that. But the chance of this happening should be relatively small unless the backplane/controller had a bad hiccup when writing that stripe. > This is the reason why, AFAIK, no RAID-6 implementation actually > attempts to recover from single-disk corruption (someone correct me if > I am wrong). > As I said I know that our Areca 1261ML does detect and correct those errors - if these are single-disk corruptions > The exception is ZFS of course, but it accomplishes single and > dual-disk corruption self-healing by using its own checksum, which is > one layer above RAID-6 (therefore unrelated to it). Yes, very helpful and definitely desirable to have :) > > [1] http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf Thanks for the pointer Cheers Carsten _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss