> I think ZFS has no specific mechanisms in respect to
> RAM integrity. It will just count on a healthy and
> robust foundation for any component in the machine.
I'd really like to understand what OS does with respect to ECC.  Anyone who 
does understand the internal operation and can comment would be doing me a real 
favor by 'splaining this to me. 8-)

And yes, it's the OS, not zfs, that would do the memory operations.

- I don't think there is a software mechanism for detecting and/or correcting 
memory errors. I'll go read up on memtest, but I suspect it is just that - a 
memory testing routine that writes to memory, reads it back, and then tries to 
discover whether what it read back is what it sent. This is a good way to 
discover hard, stuck faults in a memory array, but cannot cope well with soft 
and intermittent errors.
- ECC is great for dealing with soft, intermittent errors, because it 
completely prevents single, infrequent errors from causing "bit rot" by 
polluting memory which is then flushed back to disk (and then protected from 
rot in disk by zfs.)
- ECC can hide a rising soft error rate from a failing memory. This is good in 
that it holds off the day when things crash, but bad in that the data is in 
there to do preventive maintenance to replace the failing unit if it's bubbled 
up so the user can see it. It's bad if it hides errors from a memory testing 
routine, as has been noted in this thread. 
- You need to turn off hardware/chipset ECC to get a real result from a 
software write/read back memory test. Otherwise all you get back is 'yep, 
everything's all right'. 

I think I need to get into the OS forum to understand this better.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to