Given that the checksum algorithms utilized in zfs are already fairly CPU intensive, I can't help but wonder if it's verified that a majority of checksum inconsistency failures appear to be single bit; if it may be advantageous to utilize some computationally simpler hybrid form of a checksum/hamming code (as you've suggested), such that although a simpler hybrid form would not be able to detect as high a percentage all possible failures, it would be capable of correcting a theoretical majority while retaining an ability to detect a large majority of all possible remaining errors (which correspondingly would be known to occur with less frequency) and ideally consume no more than the exiting checksum algorithm overhead, while simultaneously improving the apparent resilience of even non-otherwise redundantly configured storage devices.
(although I confess I haven't done such an analysis yet, I suspect someone already more intimately familiar with error detection/correcting algorithm implementation/trade-offs may have some interesting suggestions, as currently having a strong detection capability without an ability to recover that which may otherwise be easily recoverable in lieu of potentially catastrophic data loss does not seem reasonable). This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss