> From: tech-boun...@lists.lopsa.org [mailto:tech-boun...@lists.lopsa.org] > On Behalf Of 'Luke S. Crawford' > > On Mon, Sep 19, 2011 at 07:25:42AM -0400, Edward Ned Harvey wrote: > > Bear in mind, the magnetic surface of a disk platter doesn't do ECC either. > > But in response to this, they use FEC chips on the circuit board of the hard > > drive, and encode more bits onto the magnetic surface. Whenever a > checksum > > error occurs, the disk controller will silently retry (indicates a soft > > error, a 1-rotation performance hit) but as long as there's no error on the > > 2nd or 3rd or 4th attempt, the hardware silently hides this condition from > > the OS. You might get SMART indicating failure predicted. > > I still don't trust a single drive. Mirror them.
I don't quite get where you're coming from here. There are two separate issues - mirror-vs-not-mirror of the ZIL, which isn't mentioned above. And somebody said the lack of ECC in the DRAM-based sata devices made them not an option, which is what I'm discussing above. As for mirroring the ZIL: Distrust for a single drive has some truth in it. If you have a disk failure (including data error) on your unmirrored ZIL device, which coincides with a system ungraceful crash, then the data on that device would be lost. The assumption, if you don't mirror your ZIL, is that the probability of these multiple failures coinciding is small enough to be comparable to the probability of multiple disk failures coinciding. > So you are suggesting that maybe the device does the sort of error > correction that hard drives do on their platters on non-ECC ram? Just suggesting a possibility. We know this is the case for HDD's and SSD's. Why not also DRAM based drives? > I soppose that is possible... but I find it fairly unlikely. this was > not an 'Enterprise' product, I mean all HDD's and SSD's. Not just enterprise ones. So this DRAM device not being enterprise level... Maybe significant, maybe not. > I mean, yeah, I soppose you could implement some sort of error correction > outside of the dimm? but why would you? I think you'd have a difficult > time doing it both safely and more efficently than commodity ECC ram. Take it for granted, because of HDD/SSD, yes it's definitely possible, and common, for error detection/correction to happen on-chip, outside of the storage media, very close to the storage media. Now you raise an excellent question: In the DRAM SATA device, which design would be more attractive to the manufacturer? use ECC ram, or use FEC outside of the ram, as they do for other types of devices (HDD/SSD)? I can say this: ECC ram uses 9 bits instead of 8. This is not a simple parity bit (because parity is only useful for detecting, not correcting errors). But the payload is 8/9. Also, the actual error detection happens off-chip, not inside the DIMM. That's why your motherboard needs to have support for ECC ram in order to use it, and ECC ram is slightly slower than non-ECC. Also, the volume of sales for non-ECC ram is much higher, so non-ECC ram is significantly cheaper (not just a ratio of 8:9). So take it for granted, the non-ECC ram is significantly cheaper, and even if you're using ECC, then the error detection is going to happen outside the DIMM anyway. In the case of ECC for your system memory, you need to operate on 32bits or 64bits depending on your architecture. But in the case of your DRAM SATA device, it's either 512 bytes, or 4K bytes (4096 or 32768 bits). Basically 1000 times larger word. This allows you to use a standard SATA FEC chip, which has a much better payload than 8/9. Say, for example, the FEC is using LDPC, which operates at or near the theoretical limit of the channel, it means you're (a) operating at optimal speed, (b) operating at minimal cost, (c) operating at maximum reliability. So yes, there is motivation to do the error detection outside of ECC, using FEC on non-ECC ram on the DRAM SATA device. I cannot say, of course, whether or not they're doing any of this. I can only say that yes, it's reasonable, yes it's common in other products, and yes there is motivation to do so. Don't make any assumptions about it not being done at all just because it's non-ECC ram. _______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/