Peter Wemm writes:
Thanks for your description of how ECC is reported on PCs. That was
very, very helpful.
> The Tyan Thunder 2510 BIOS even disables ECC -> NMI routing so you have to
> go to quite a bit of trouble to reprogram the serverworks chipset to
> actually generate NMI's so that you can find out if something got trashed.
Is that the He-Sl or the LE-3 chipset? Is that code available?
I have some LE-3 based boxes which I'd like be certain DTRT.
Unlike my wife's Dual Athlon, these boxes have nothing in their
BIOS pertaining to ECC error reporting. (Supermicro 370-DLE)
> Our NMI / ECC handling really really sucks in FreeBSD. Consider:
> - i686_pagezero - reads before writing in order to minimize cache snooping
> traffic in SMP systems. However, if it gets an NMI while trying to check
> if the cache line is already zero, it will take the entire machine down
> instead of just zeroing the line.
> - NFS / VM / bio: when they get an NMI while trying to copy data that is
> clean and backed by storage, they take the machine down instead of trying
> to recover and re-read the page.
> - userland.. If userland gets an NMI, the machine dies instead of killing
> the process (or rereading a text page etc if possible)
> - our NMI handlers are a festering pile of excretement. They dont have
> the code to 'ack' the NMI so it isn't possible to return after recovery.
> - and so on.
Well, at least we take the machine down, which is a heck of a lot
better than ignoring the problem, which is really all that I was
hoping for.
Thanks again,
Drew
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message