On Mon, 21 Nov 2005, John Baldwin wrote:
On Saturday 19 November 2005 02:16 pm, Uwe Doering wrote:
John Baldwin wrote:
On Friday 18 November 2005 10:05 pm, Charles Sprickman wrote:
I tried this query on -stable, hoping someone here can help me further
understand and troubleshoot this.
Reference:
http://thread.gmane.org/gmane.os.freebsd.stable/32837
In short, top, ps report 0% CPU on all processes as of a few weeks ago.
"systat -vmstat" hands out the "Alternate system clock has died" error.
Box is running 4.8-p24 and has been up 425 days. Nothing out of the
ordinary except for the above symptoms. In searching the various
lists/newsgroups, it seems that the other folks with this problem have
fixed it in various ways:
-early 4.x users referenced a PR that was committed before 4.8
-some 5.3 users reported this with unknown resolution/cause
-sending init a HUP was suggested (tried it, no luck)
-setting kern.timecounter.method: 1 (tried it, no luck)
-one user seemed to actually have a dead timer
Actually, there was a patch that was committed in 5.4 and 6.0 for this
issue. You can see the diff here:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/i386/isa/clock.c.diff?r1=1.
213&r2=1.214&f=h
That patch would probably backport to 4.x fairly easily.
I just looked at RELENG_4, and yes, backporting should be easy. Though
I haven't tried it yet on our machines.
I wonder, however, what's writing to the RTC on a running server. Could
this event perhaps have been triggered by the recent Daylight Saving
Time switch?
Yep. Also, if you are using ntp, then the adjustments to the time are getting
pushed back to the RTC as well.
I run ntp everywhere.
So it certainly looks easy enough for me to change the first two sections
of the diff referenced above, but I'm having issues finding that last one
in cpu_initclocks(). It looks like that section really has changed quite
a bit. (see v.1.206)
The original PR that references this is against 4.something and only
patches in one place:
http://www.freebsd.org/cgi/query-pr.cgi?pr=17800
What's my best course of action to try and fix this? It looks like I can
take the first two hunks of that cvsweb diff and then add on the one
liner from the PR, but I have no idea what that's actually doing. My
experience with C is limited to making very small changes to existing
work, and nothing quite as important as this one file appears to be (from
reading the commit logs on it).
Is there any interest in moving this back to 4-STABLE?
And lastly, is there any snippet of code that can twiddle the clock from
userspace and determine if it's wedged or dead?
Scheduling a reboot of this machine gets much, much more complicated if I
need to have another box standing by due to a truly dead timer.
Thanks so much to both of you for your help...
Charles
--
John Baldwin <[EMAIL PROTECTED]> <>< http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve" = http://www.FreeBSD.org
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"