I did some reading on DDRn ram and controller chips and how they do ECC. Sorry, but I was moderately incorrect. Here's closer to what happens.
DDRn memory has no ECC logic on the DIMMs. What it has is an additional eight bits of memory for each 64 bit read/write operation. That is, for ECC DIMMs, the reads and writes are 72 bits wide, not 64. The extra 8 bits are read/written just like any other bits. The actual operation of error checking and correction happens in the memory controllers (for the ones I looked at at least). These memory controller chipsets do the actual interaction with the DIMMs and (a) determine what, if any, bits get written to all 64 or 72 bits as well as (b) looking at the data back from a read to see if that they get back is acceptable. - if the memory controller chipset tolerates only 64 bit wide DIMMs but not 72 bit wide ones, it cannot do ECC. - if the memory controller tolerates both 64 bit and 72 bit wide DIMMs, perhaps by ignoring the "extra" bits in a 64 wide read/write, then either style DIMM can be used, but if the memory controller doesn't computer, write, and then check the extra eight bit for errors, ECC never happens - if the controller computes the extra checking bits and sends them with write, and also checks them on a read, it has the potential to do effective ECC in the controller itself, in hardware. - for the couple of chipsets I looked at, if i read correctly, the controller is set up by the BIOS for doing or not doing ECC, and it may signal back to the software that an ECC has happened. I was incorrect - for DDRn, it's not a signalling line that something is wrong. Motherboards can force ECC not to happen by either not carrying the extra bits to/from the DIMM sockets, in which case even if the memory controller supports ECC internall, it will not work. This is one method for tolerating either kind of DIMM, I guess. Another is to program the chipset in BIOS to not do ECC. What I'm not clear on is what OS does with this. I'm not competent to delve through the OS and find where the connection to the memory controller ECC enable/setup happens and what the ramifications are. And I don't know what the link between hardware ECC write/read in the memory is, and a software scrub. Is the nature of the scrub that it walks through memory doing read/write/read and looking at the ECC reply in hardware? I came up with an all-software scrubbing technique, by doing a software block check much like zfs, but that seems very impractical. -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss