On Mon, 3 Mar 2008, Nathan Kroenert wrote:
> Speaking of expensive, but interesting things we could do -
>
> From the little I know of ZFS's checksum, it's NOT like the ECC
> checksum we use in memory in that it's not something we can use to
> determine which bit flipped in the event that there was a single bit
> flip in the data. (I could be completely wrong here... but...)

It seems that the emphasis on single-bit errors may be misplaced.  Is 
there evidence which suggests that single-bit errors are much more 
common than multiple bit errors?

> What is the chance we could put a little more resilience into ZFS such
> that if we do get a checksum error, we systematically flip each bit in
> sequence and check the checksum to see if we could in fact proceed
> (including writing the data back correctly.).

It is easier to retry the disk read another 100 times or store the 
data in multiple places.

> Or build into the checksum something analogous to ECC so we can choose
> to use NON-ZFS protected disks and paths, but still have single bit flip
> protection...

Disk drives commonly use an algorithm like Reed Solomon 
(http://en.wikipedia.org/wiki/Reed-Solomon_error_correction) which 
provides forward-error correction.  This is done in hardware.  Doing 
the same in software is likely to be very slow.

> What do others on the list think? Do we have enough folks using ZFS on
> HDS / EMC / other hardware RAID(X) environments that might find this useful?

It seems that since ZFS is intended to support extremely large storage 
pools, available energy should be spent ensuring that the storage pool 
remains healthy or can be repaired.  Loss of individual file blocks is 
annoying, but loss of entire storage pools is devastating.

Since raw disk is cheap (and backups are expensive), it makes sense to 
write more redundant data rather than to minimize loss through exotic 
algorithms.  Even if RAID is not used, redundant copies may be used on 
the same disk to help protect against block read errors.

Bob
======================================
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to