Dmitry Pryanishnikov schrieb:
When you wrote "ECC is a way to mask broken hardware", you were plain
wrong.
If you're using hardware w/o ECC, it just can't tell whether error
present
or absent. So ECC _is_ the way to detect (not mask) broken hardware.
Ok, thanks. I think I understand the meaning of ECC now.
So, unlike my supplier claims, ECC is not supposed to help against
hardware failures.
But it is the way to detect them, right?
If you want ECC corrector to raise NMI on corrected error (as well as
uncorrectable), just set approproate bit in control register - every
Intel's ECC-capable chipset allows it. But if we're speaking about
production environment, such behaviour (abnormal termination on
_corrected_
error) is unacceptable.
"abnormal termination" is not only acceptable for me, it is what I am
looking for.
Make the node crash completely, so one of the others can take over its
task(s).
Don't get me wrong, but tracking bugs in FreeBSD is quite more of an
effort than "just" akquiring a new box...
I don't see connection between this sentence and ECC (which is
hardware option).
What I wanted to say:
Looking for errors in the logs is only a few seconds.
Finding out what caused them, is hours...
Akquiring a new box is only $29,95 ;) - that's like 30 minutes, if you
regard it from the business side. ... I rather rent 100 boxes to do the
task of ten, than employ 100 admins to find the "real" problem.
Thanks, Dmitry. I think I know what to look for now...
M.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"