Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-14 Thread Richard Elling
paul wrote: > bob wrote: > >> On Wed, 13 Aug 2008, paul wrote: >> >> >>> Shy extremely noisy hardware and/or literal hard failure, most >>> errors will most likely always be expressed as 1 bit out of some >>> very large N number of bits. >>> >> This claim ignores the fact that mos

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-14 Thread paul
Yes, Thank you. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-14 Thread paul
bob wrote: > On Wed, 13 Aug 2008, paul wrote: > >> Shy extremely noisy hardware and/or literal hard failure, most >> errors will most likely always be expressed as 1 bit out of some >> very large N number of bits. > > This claim ignores the fact that most computers today are still based > on

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread Richard Elling
paul wrote: > Bob wrote: > >> ... Given the many hardware safeguards against single (and several) bit >> errors, >> the most common data error will be large. For example, the disk drive may >> return data from the wrong sector. >> > > - actually data integrity check bits as may exist with

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread Bob Friesenhahn
On Wed, 13 Aug 2008, paul wrote: > Shy extremely noisy hardware and/or literal hard failure, most > errors will most likely always be expressed as 1 bit out of some > very large N number of bits. This claim ignores the fact that most computers today are still based on synchronously clocked pa

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread paul
Bob wrote: > ... Given the many hardware safeguards against single (and several) bit > errors, > the most common data error will be large. For example, the disk drive may > return data from the wrong sector. - actually data integrity check bits as may exist within memory systems and/or communi

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread Bob Friesenhahn
On Wed, 13 Aug 2008, paul wrote: > Given that the checksum algorithms utilized in zfs are already fairly CPU > intensive, I > can't help but wonder if it's verified that a majority of checksum > inconsistency failures > appear to be single bit; if it may be advantageous to utilize some > comput

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-13 Thread paul
Given that the checksum algorithms utilized in zfs are already fairly CPU intensive, I can't help but wonder if it's verified that a majority of checksum inconsistency failures appear to be single bit; if it may be advantageous to utilize some computationally simpler hybrid form of a checksum/ha

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-12 Thread Anton B. Rang
Reed-Solomon could correct multiple-bit errors, but an effective Reed-Solomon code for 128K blocks of data would be very slow if implemented in software (and, for that matter, take a lot of hardware to implement). A multi-bit Hamming code would be simpler, but I suspect that undetected multi-bit

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-12 Thread paul
Although I don't know for sure that most such errors are in fact single bit in nature, I can only surmise they most likely statistically are absent detection otherwise; as with the exception of error corrected memory systems and/or check-summed communication channels, each transition of data betw

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-12 Thread Richard Elling
Anton B. Rang wrote: > That brings up another interesting idea. > > ZFS currently uses a 128-bit checksum for blocks of up to 1048576 bits. > > If 20-odd bits of that were a Hamming code, you'd have something slightly > stronger than SECDED, and ZFS could correct any single-bit errors encountered.

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-12 Thread Mario Goebbels (iPhone)
I suppose an error correcting code like 256bit Hamming or Reed-Solomon can't substitute as reliable checksum on the level of default Fletcher2/4? If it can, it could be offered as alternative algorithm where necessary and let ZFS react accordingly, or not? Regards, -mg On 12-août-08, at 08:

Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-11 Thread Anton B. Rang
That brings up another interesting idea. ZFS currently uses a 128-bit checksum for blocks of up to 1048576 bits. If 20-odd bits of that were a Hamming code, you'd have something slightly stronger than SECDED, and ZFS could correct any single-bit errors encountered. This could be done without ch