On Mar 11, 2010, at 7:49 AM, R.G. Keen wrote:
>> I think ZFS has no specific mechanisms in respect to
>> RAM integrity. It will just count on a healthy and
>> robust foundation for any component in the machine.
> I'd really like to understand what OS does with respect to ECC.  Anyone who 
> does understand the internal operation and can comment would be doing me a 
> real favor by 'splaining this to me. 8-)

There are multiple levels of ECC, error reporting, and scrubs at work. The
exact ones depend largely on the hardware and how it handles ECC.  The
M9000-class machines, for instance, have sophisticated memory scrubbing
built into the memory controllers and include options for memory mirroring.
Some Xeon models support memory mirroring and some PC vendors even
claimed to have hot swappable DIMMs, mirrored of course.  I don't have the
intestinal fortitude to hot swap a DIMM on a PC, so I'll leave that to the 
glossies.
Some processors just go bonkers and abort when they see an uncorrectable
ECC error... not much Solaris can do when it isn't running.

So there are hardware scrubbers in many modern servers and software 
scrubbers in Solaris.  For an interesting read on how Solaris handles memory 
faults, see the pointers at
http://blogs.sun.com/relling/entry/analysis_of_memory_page_retirement

On Mar 11, 2010, at 9:20 AM, Christo Kutrovsky wrote:
> Do you know how you can check the number of CORRECTED errors by ECC in 
> OpenSolaris?

FMA logs the errors as seen by Solaris or as reported by hardware that
notifies Solaris.  For some systems, these are also logged to the system
controller.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to