> I think ZFS has no specific mechanisms in respect to > RAM integrity. It will just count on a healthy and > robust foundation for any component in the machine. I'd really like to understand what OS does with respect to ECC. Anyone who does understand the internal operation and can comment would be doing me a real favor by 'splaining this to me. 8-)
And yes, it's the OS, not zfs, that would do the memory operations. - I don't think there is a software mechanism for detecting and/or correcting memory errors. I'll go read up on memtest, but I suspect it is just that - a memory testing routine that writes to memory, reads it back, and then tries to discover whether what it read back is what it sent. This is a good way to discover hard, stuck faults in a memory array, but cannot cope well with soft and intermittent errors. - ECC is great for dealing with soft, intermittent errors, because it completely prevents single, infrequent errors from causing "bit rot" by polluting memory which is then flushed back to disk (and then protected from rot in disk by zfs.) - ECC can hide a rising soft error rate from a failing memory. This is good in that it holds off the day when things crash, but bad in that the data is in there to do preventive maintenance to replace the failing unit if it's bubbled up so the user can see it. It's bad if it hides errors from a memory testing routine, as has been noted in this thread. - You need to turn off hardware/chipset ECC to get a real result from a software write/read back memory test. Otherwise all you get back is 'yep, everything's all right'. I think I need to get into the OS forum to understand this better. -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss