Re: [zfs-discuss] integrated failure recovery thoughts

2008-08-14 Thread paul
I apologize for in effect suggesting that which was previously suggested in an earlier thread: http://mail.opensolaris.org/pipermail/zfs-discuss/2008-March/046234.html And discovering that the feature to attempt worst case single bit recovery had apparently already been present in some form in

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-14 Thread Richard Elling
paul wrote: > bob wrote: > >> On Wed, 13 Aug 2008, paul wrote: >> >> >>> Shy extremely noisy hardware and/or literal hard failure, most >>> errors will most likely always be expressed as 1 bit out of some >>> very large N number of bits. >>> >> This claim ignores the fact that mos

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-14 Thread paul
Yes, Thank you. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-14 Thread paul
bob wrote: > On Wed, 13 Aug 2008, paul wrote: > >> Shy extremely noisy hardware and/or literal hard failure, most >> errors will most likely always be expressed as 1 bit out of some >> very large N number of bits. > > This claim ignores the fact that most computers today are still based > on

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread Richard Elling
paul wrote: > Bob wrote: > >> ... Given the many hardware safeguards against single (and several) bit >> errors, >> the most common data error will be large. For example, the disk drive may >> return data from the wrong sector. >> > > - actually data integrity check bits as may exist with

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread Bob Friesenhahn
On Wed, 13 Aug 2008, paul wrote: > Shy extremely noisy hardware and/or literal hard failure, most > errors will most likely always be expressed as 1 bit out of some > very large N number of bits. This claim ignores the fact that most computers today are still based on synchronously clocked pa

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread paul
Bob wrote: > ... Given the many hardware safeguards against single (and several) bit > errors, > the most common data error will be large. For example, the disk drive may > return data from the wrong sector. - actually data integrity check bits as may exist within memory systems and/or communi

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread Bob Friesenhahn
On Wed, 13 Aug 2008, paul wrote: > Given that the checksum algorithms utilized in zfs are already fairly CPU > intensive, I > can't help but wonder if it's verified that a majority of checksum > inconsistency failures > appear to be single bit; if it may be advantageous to utilize some > comput

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread paul
Given that the checksum algorithms utilized in zfs are already fairly CPU intensive, I can't help but wonder if it's verified that a majority of checksum inconsistency failures appear to be single bit; if it may be advantageous to utilize some computationally simpler hybrid form of a checksum/ha

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-12 Thread Anton B. Rang
Reed-Solomon could correct multiple-bit errors, but an effective Reed-Solomon code for 128K blocks of data would be very slow if implemented in software (and, for that matter, take a lot of hardware to implement). A multi-bit Hamming code would be simpler, but I suspect that undetected multi-bit

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-12 Thread paul
Although I don't know for sure that most such errors are in fact single bit in nature, I can only surmise they most likely statistically are absent detection otherwise; as with the exception of error corrected memory systems and/or check-summed communication channels, each transition of data betw

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-12 Thread Richard Elling
Anton B. Rang wrote: > That brings up another interesting idea. > > ZFS currently uses a 128-bit checksum for blocks of up to 1048576 bits. > > If 20-odd bits of that were a Hamming code, you'd have something slightly > stronger than SECDED, and ZFS could correct any single-bit errors encountered.

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-12 Thread Mario Goebbels (iPhone)
I suppose an error correcting code like 256bit Hamming or Reed-Solomon can't substitute as reliable checksum on the level of default Fletcher2/4? If it can, it could be offered as alternative algorithm where necessary and let ZFS react accordingly, or not? Regards, -mg On 12-août-08, at 08:

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-11 Thread Anton B. Rang
That brings up another interesting idea. ZFS currently uses a 128-bit checksum for blocks of up to 1048576 bits. If 20-odd bits of that were a Hamming code, you'd have something slightly stronger than SECDED, and ZFS could correct any single-bit errors encountered. This could be done without ch

[zfs-discuss] integrated failure recovery thoughts

2008-08-11 Thread paul
As most of the zfs recovery problems seem to stem from zfs’s own strict insistence that data be ideally consistent with its corresponding checksum, which of course is good when correspondingly consistent data may be recovered from somewhere, but catastrophic otherwise; it seem clear that zfs must s