> On Dec 14, 2007 1:12 AM, can you guess? > <[EMAIL PROTECTED]> wrote: > > > yes. far rarer and yet home users still see > them. > > > > I'd need to see evidence of that for current > hardware. > What would constitute "evidence"? Do anecdotal tales > from home users > qualify? I have two disks (and one controller!) that > generate several > checksum errors per day each.
I assume that you're referring to ZFS checksum errors rather than to transfer errors caught by the CRC resulting in retries. If so, then the next obvious question is, what is causing the ZFS checksum errors? And (possibly of some help in answering that question) is the disk seeing CRC transfer errors (which show up in its SMART data)? If the disk is not seeing CRC errors, then the likelihood that data is being 'silently' corrupted as it crosses the wire is negligible (1 in 65,536 if you're using ATA disks, given your correction below, else 1 in 4.3 billion for SATA). Controller or disk firmware bugs have been known to cause otherwise undetected errors (though I'm not familiar with any recent examples in normal desktop environments - e.g., the CERN study discussed earlier found a disk firmware bug that seemed only activated by the unusual demands placed on the disk by a RAID controller, and exacerbated by that controller's propensity just to ignore disk time-outs). So, for that matter, have buggy file systems. Flaky RAM can result in ZFS checksum errors (the CERN study found correlations there when it used its own checksum mechanisms). I've also seen > intermittent checksum > fails that go away once all the cables are wiggled. Once again, a significant question is whether the checksum errors are accompanied by a lot of CRC transfer errors. If not, that would strongly suggest that they're not coming from bad transfers (and while they could conceivably be the result of commands corrupted on the wire, so much more data is transferred compared to command bandwidth that you'd really expect to see data CRC errors too if commands were getting mangled). When you wiggle the cables, other things wiggle as well (I assume you've checked that your RAM is solidly seated). On the other hand, if you're getting a whole bunch of CRC errors, then with only a 16-bit CRC it's entirely conceivable that a few are sneaking by unnoticed. > > > Unlikely, since transfers over those connections > have been protected by 32-bit CRCs since ATA busses > went to 33 or 66 MB/sec. (SATA has even stronger > protection) > The ATA/7 spec specifies a 32-bit CRC (older ones > used a 16-bit CRC) > [1]. Yup - my error: the CRC was indeed introduced in ATA-4 (33 MB/sec. version), but was only 16 bits wide back then. The serial ata protocol also specifies 32-bit > CRCs beneath 8/10b > coding (1.0a p. 159)[2]. That's not much stronger at > all. The extra strength comes more from its additional coverage (commands as well as data). - bill This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss