This is encouraging - it's the first I've heard of someone who has
found a way to trigger the problem "on demand". The problems I was
experiencing were on a dual Xeon with HTT enabled as well. Perhaps
someone out there who knows much more about the inner workings of
FreeBSD may have an idea of why running top in "aggressive mode" like
this might trigger the random rebooting. In particular, it would be
nice to *know* that someone out there specifically fixed whatever is
wrong in 5.4 when bringing it to 6.0. It's encouraging that you
haven't had any problems since upgrading to 6.0, but I have to wonder
if the bug's actually fixed, or the specific trigger of running top
doesn't trigger the problem but the problem is still lurking in the
background waiting to strike with the right combination of events.
In any case, I'm anxious to try it out myself on our server to see if
"top -s0" brings it down "on command" with HTT enabled, and not with
HTT disabled. But I'm going to have to wait until some time over the
Christmas holidays to do that sort of experimentation at a time when
it isn't affecting the end users of the machine. I may also upgrade
to 6.0 at that time, since by then it will have been out for a couple
of months, so most of the worst quirks should be worked out by then.
In the meantime, disabling HTT as I've done seems like a reasonable
precaution to improve the stability..
Thanks for your help!
Dan
On Nov 29, 2005, at 10:50 PM, Stephen Montgomery-Smith wrote:
Dan Charrois wrote:
It actually may be a comfort, since perhaps HTT is related to the
culprit. Since the last crash, about a month ago, I disabled
HTT, both in the kernel as well in the BIOS. So as far as I
know, it's completely been disabled (and the boot messages and
top only show 2 CPUs). And I haven't had the system go down for
nearly a month now.
I don't know if it is related, but I used to have random reboots on
a dual Xeon system with HTT enabled. It happened when I ran a CPU
intensive threaded program at the same time as "top" - running "top
-s0" (which you have to do as root) could usually kill the machine
in seconds if not minutes.
All I can tell you is that with FreeBSD 6.0 the problem disappeared.
Well not totally - I still get a bunch of harmless calcru negative
messages, although I don't know if it is actually related to the
boot problems I used to have with FreeBSD 5.4, because I get the
calcru backwards messages even with HTT disabled.
Anyway, if you are in the mood to try it out, you might like to try
re-enabling HTT, starting up whatever process you usually use (I'm
guessing it is MySQL), and then run "top -s0". If you get a crash
soon after that, you have the same problem I had.
Let me also add that these crashes usually did not trigger a crash
dump (I had dumpon set), and when it did the resulting dump looked
rather corrupted.
Stephen
--
Syzygy Research & Technology
Box 83, Legal, AB T0G 1L0 Canada
Phone: 780-961-2213
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"