On Jun 22, 2005, at 9:59 AM, Matt Juszczak wrote:
The vast majority of panics are hardware-related. It is rare
nowadays
for a usermode program to make the system panic. In particular
you said
the problem happens more under load. That really points even more
to a
hardware problem - bad CPU cache ram, bad ram, scsi termination, that
sort of thing.
Ted
This is kind of going to be a blanket post to all the recent
suggestions to me. I appreciate suggestions :) Ted, sorry, my
other posts had dmesg and hardware specs, etc. I just couldn't
remember the subject line of that thread. I'll be more descriptive
here.
We have two different servers crashing. Both are SMP, but on
different hardware. We have five freeBSD servers in total, and
only two are affected. That is why I do not believe this is a
hardware problem.
In any case, the machines are in a cold room where the temperature
is constantly maintained. 20 other servers in there are perfectly
stable, with no probs.
This particular machine that crashed last night while running
portsdb -uU is a Super Micro machine, with hyperthreading disabled
in the bios, dual CPU 3.06 ghz, with 4 gigs memory. We ran mem
test on orion (the machine that crashed last night) a week or so
ago, and it found 70,000 ECC errors. Those were fixed and that
machine has been stable until last night. I've now disabled SMP
support, we'll see if that keeps it stable or not. Portsdb -uU ran
without problems after I disabled SMP.
As far as uranus, the other box (we keep a planet scheme for a
certain set of servers), we ran memtest86 and found no errors at
all. That box crashed about two days ago but has been stable
since. It has not lasted more than a week without doing a kernel
trap and freezing.
It seems that both these servers have this problem. Out of the
five FreeBSD servers we have, these two are the ones with the
highest load. Maybe a higher load on the other three servers would
cause the same problem. I agree with you that this is a hardware
problem, but on more than one server with two different
architectures and our highest load makes me re-consider.
If this is truly a bug in FreeBSD 5.4-RELEASE, maybe this is
something that has been fixed in -stable? I will compile a debug
kernel today and try to provide a trace to the problem. I'll do it
on which ever server crashes next.
What do they have in common? Disk controller? Network controller?
Chad
---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
[EMAIL PROTECTED]
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"