Charles Sprickman wrote:
On Fri, 18 Nov 2005, Uwe Doering wrote:
Charles Sprickman wrote:
I've been digging through Google for more information on this. I
have a 4.8 box that's been up for about 430 days. In the last week
or so, top and ps have started reporting all CPU usage numbers as
zero, and running "systat -vmstat" results in the message "The
alternate system clock has died! Reverting to ``pigs'' display".
[...]
We had this once at work, quite a while ago. The "alternate system
clock" is in fact the Real Time Clock (RTC) on the mainboard. In our
case we were lucky in that it was just the quartz device that failed
due to an improperly soldered lead which finally came off. We fixed
the soldering and the problem was gone.
Are there any tools to verify that the RTC is working?
"systat -vmstat" will show you the interrupt that it drives. In our
case it's irq8, which is in fact labeled "rtc". It is supposed to run
at 128 Hz. Under load it can drop to some lower value. This is normal.
I don't exactly
understand what the RTC is, but would the machine not be suffering some
other problems if there was an actual hardware failure? Doesn't the
system rely on this to time everything from the processors to memory to
PCI slots and interrupts?
No, the RTC drives only the interrupt that is responsible for collecting
the CPU usage data. When it fails the CPU usage in "top", "ps" etc.
just drops to zero, as you've observed, but the server continues to run.
If the failure is permanent the machine refuses to boot, though. At
least that's what happened in our case. Apparently the RTC chip is
essential to the mainboard's boot sequence. For instance, the initial
date and time information comes from this chip.
On the other hand, if a reset corrects the problem then the RTC chip
probably got hung, or there is a problem with the interrupt controller
it is connected to. On a properly working mainboard this shouldn't
happen, of course.
Is there any simple way to figure out if this is hardware or software?
I don't know of any. However, we run FreeBSD almost since 4.0, on
various mainboards, UP and SMP, and we've never seen these symptoms but
in this one case mentioned above. So I suppose it's not a kernel bug.
I haven't looked at the PR database, though.
Uwe
--
Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers
[EMAIL PROTECTED] | http://www.escapebox.net
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"