Re: FreeBSD 6.x CVSUP today crashes with zero load ...

M.Hirsch Mon, 26 Jun 2006 16:21:05 -0700

Dmitry Pryanishnikov schrieb:

When you wrote "ECC is a way to mask broken hardware", you were plainwrong.If you're using hardware w/o ECC, it just can't tell whether errorpresent
or absent. So ECC _is_ the way to detect (not mask) broken hardware.

Ok, thanks. I think I understand the meaning of ECC now.

So, unlike my supplier claims, ECC is not supposed to help againsthardware failures.

But it is the way to detect them, right?

If you want ECC corrector to raise NMI on corrected error (as well asuncorrectable), just set approproate bit in control register - every
Intel's ECC-capable chipset allows it. But if we're speaking about
production environment, such behaviour (abnormal termination on_corrected_
error) is unacceptable.

"abnormal termination" is not only acceptable for me, it is what I amlooking for.Make the node crash completely, so one of the others can take over itstask(s).

Don't get me wrong, but tracking bugs in FreeBSD is quite more of aneffort than "just" akquiring a new box...
I don't see connection between this sentence and ECC (which ishardware option).


What I wanted to say:
Looking for errors in the logs is only a few seconds.
Finding out what caused them, is hours...

Akquiring a new box is only $29,95 ;) - that's like 30 minutes, if youregard it from the business side. ... I rather rent 100 boxes to do thetask of ten, than employ 100 admins to find the "real" problem.


Thanks, Dmitry. I think I know what to look for now...

M.
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: FreeBSD 6.x CVSUP today crashes with zero load ...

Reply via email to