Dmitry Pryanishnikov schrieb:

When you wrote "ECC is a way to mask broken hardware", you were plain wrong. If you're using hardware w/o ECC, it just can't tell whether error present
or absent. So ECC _is_ the way to detect (not mask) broken hardware.

Ok, thanks. I think I understand the meaning of ECC now.
So, unlike my supplier claims, ECC is not supposed to help against hardware failures.
But it is the way to detect them, right?

If you want ECC corrector to raise NMI on corrected error (as well as uncorrectable), just set approproate bit in control register - every
Intel's ECC-capable chipset allows it. But if we're speaking about
production environment, such behaviour (abnormal termination on _corrected_
error) is unacceptable.

"abnormal termination" is not only acceptable for me, it is what I am looking for. Make the node crash completely, so one of the others can take over its task(s).

Don't get me wrong, but tracking bugs in FreeBSD is quite more of an effort than "just" akquiring a new box...

I don't see connection between this sentence and ECC (which is hardware option).

What I wanted to say:
Looking for errors in the logs is only a few seconds.
Finding out what caused them, is hours...
Akquiring a new box is only $29,95 ;) - that's like 30 minutes, if you regard it from the business side. ... I rather rent 100 boxes to do the task of ten, than employ 100 admins to find the "real" problem.

Thanks, Dmitry. I think I know what to look for now...

M.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to