I did some reading on DDRn ram and controller chips and how they do ECC.  
Sorry, but I was moderately incorrect. Here's closer to what happens. 

DDRn memory has no ECC logic on the DIMMs. What it has is an additional eight 
bits of memory for each 64 bit read/write operation. That is, for ECC DIMMs, 
the reads and writes are 72 bits wide, not 64. The extra 8 bits are 
read/written just like any other bits. 

The actual operation of error checking and correction happens in the memory 
controllers (for the ones I looked at at least). These memory controller 
chipsets do the actual interaction with the DIMMs and (a) determine what, if 
any, bits get written to all 64 or 72 bits as well as (b) looking at the data 
back from a read to see if that they get back is acceptable. 

- if the memory controller chipset tolerates only 64 bit wide DIMMs but not 72 
bit wide ones, it cannot do ECC.
- if the memory controller tolerates both 64 bit and 72 bit wide DIMMs, perhaps 
by ignoring the "extra" bits in a 64 wide read/write, then either style DIMM 
can be used, but if the memory controller doesn't computer, write, and then 
check the extra eight bit for errors, ECC never happens
- if the controller computes the extra checking bits and sends them with write, 
and also checks them on a read, it has the potential to do effective ECC in the 
controller itself, in hardware. 
- for the couple of chipsets I looked at, if i read correctly, the controller 
is set up by the BIOS for doing or not doing ECC, and it may signal back to the 
software that an ECC has happened. 

I was incorrect - for DDRn, it's not a signalling line that something is wrong. 
Motherboards can force ECC not to happen by either not carrying the extra bits 
to/from the DIMM sockets, in which case even if the memory controller supports 
ECC internall, it will not work. This is one method for tolerating either kind 
of DIMM, I guess. Another is to program the chipset in BIOS to not do ECC. 

What I'm not clear on is what OS does with this. I'm not competent to delve 
through the OS and find where the connection to the memory controller ECC 
enable/setup happens and what the ramifications are. And I don't know what the 
link between hardware ECC write/read in the memory is, and a software scrub. 

Is the nature of the scrub that it walks through memory doing read/write/read 
and looking at the ECC reply in hardware? I came up with an all-software 
scrubbing technique, by doing a software block check much like zfs, but that 
seems very impractical.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to