On Mar 11, 2010, at 7:49 AM, R.G. Keen wrote: >> I think ZFS has no specific mechanisms in respect to >> RAM integrity. It will just count on a healthy and >> robust foundation for any component in the machine. > I'd really like to understand what OS does with respect to ECC. Anyone who > does understand the internal operation and can comment would be doing me a > real favor by 'splaining this to me. 8-)
There are multiple levels of ECC, error reporting, and scrubs at work. The exact ones depend largely on the hardware and how it handles ECC. The M9000-class machines, for instance, have sophisticated memory scrubbing built into the memory controllers and include options for memory mirroring. Some Xeon models support memory mirroring and some PC vendors even claimed to have hot swappable DIMMs, mirrored of course. I don't have the intestinal fortitude to hot swap a DIMM on a PC, so I'll leave that to the glossies. Some processors just go bonkers and abort when they see an uncorrectable ECC error... not much Solaris can do when it isn't running. So there are hardware scrubbers in many modern servers and software scrubbers in Solaris. For an interesting read on how Solaris handles memory faults, see the pointers at http://blogs.sun.com/relling/entry/analysis_of_memory_page_retirement On Mar 11, 2010, at 9:20 AM, Christo Kutrovsky wrote: > Do you know how you can check the number of CORRECTED errors by ECC in > OpenSolaris? FMA logs the errors as seen by Solaris or as reported by hardware that notifies Solaris. For some systems, these are also logged to the system controller. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance http://nexenta-atlanta.eventbrite.com (March 16-18, 2010) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss