>> I'm having a problem with spontaneous restarts.  This isn't a new
>> problem,
>> but I've done the obvious things and the problem hasn't gone away.  I
>> was thinking of asking on -hackers, but I'm trying here first.

I have a mail server that is doing this exact thing. Very spontaneous,
more problematic when under heavy load. I tested this theory with
buildworld, and it just barfs.

I have tracked it down to a problem with the power supply. In my case, the
fan is not operating properly (it is moving, but very slowly). I pop the
disk in a new machine and voila...problem fixed.

Just my $.02

sb

>>
>> The system is a 4.8 with a mix of patches and port upgrades of various
>> ages.  I'm planning to rebuild the whole thing, bringing it up to date,
>> but I'm hoping to be able to wait for a 5.x in STABLE; I don't want to
>> do
>> this twice, since I expect I'll have to dump and restore everything.
>>
>> The hardware is a 2.6 GHz P4 with 2 GByte of GEIL dual-channel memory.
>> (The problem existed on the previous, somewhat slower, memory as well.)
>> The box contains the processor and motherboard (Gigabyte GA-SINXP1394),
>> two floppy drives, CD and CD/W drives, an HP DAT, three IBM/Hitachi
>> 36G/10K SCSI drives, and one 120G IDE.  The SCSI card is by Adaptec; the
>> video card is a low-end NVidia, and I'm running their video driver.  The
>> PS is an Antec True380, which should be enough for the box, with
>> something
>> to spare.  There are several extra, large fans, of which more later.
>>
>> The system, monitor, printer, and cable modem are all powered through an
>> APC BACK-UPS 450, about 18 months old.  It's shown in the last week that
>> it can keep things up for more than an hour.
>>
>> The symptom is a restart that leaves no indication of how it happened.
>>
>>   Recently, the system shut down (completely, and at the power supply)
>>   instead of restarting.  In that case, the last deliberate shutdown
>>   was a `shutdown -h now'; it appears that in every other case, the last
>>   deliberate shutdown was a `-r now'.  (Question: does the machine
>>   architecture have settings for reset-resume .vs. reset-halt, settings
>>   that might be remembered when a later action occurs?)  It has
>>   subsequently shut down with an immediate restart.
>>
>> There are no failure indications in the /var/log/messages, nor reported
>> by dmesg.  (The console scrolls by very quickly.)  The message sequence
>> over the restart typically looks like this:
>>
>> =======================================================================
>> Jun  7 18:39:09 moleend /kernel: arp: 24.228.64.1 moved from
>> 00:05:00:e7:17:44
>> t
>> o 00:05:00:e7:17:57 on em0
>> Jun  7 18:39:09 moleend /kernel: arp: 24.228.64.1 moved from
>> 00:05:00:e7:17:57
>> t
>> o 00:05:00:e7:17:44 on em0
>> Jun  7 18:59:06 moleend dhclient: New Network Number: 24.228.64.0
>> Jun  7 18:59:06 moleend dhclient: New Broadcast Address: 255.255.255.255
>> Jun  7 22:47:33 moleend /kernel: Copyright (c) 1992-2003 The FreeBSD
>> Project.
>> Jun  7 22:47:33 moleend /kernel: Copyright (c) 1979, 1980, 1983, 1986,
>> 1988,
>> 198
>> 9, 1991, 1992, 1993, 1994
>> ========================================================================
>>
>> The restart most often occurs AFTER X has been shut down (and often
>> restarted) but sometimes when X has not been run.  It most often occurs
>> when the system is under heavy CPU load, but sometimes when the load
>> has been light.
>>
>> I thought at one time it might be a thermal problem and undertook to
>> fix that.  (I am still working to get more cooling air over the disks.)
>> Right now, I have 120 mm fans rated at 130-135 CFM (Panaflow and JMC)
>> pushing air in and out of the box, and pressurizing a duct feeding the
>> CPU cooler, which is now cool to the touch.  The memory modules are cool
>> to the touch.  While the disks need a proper plenum to route more air
>> over them, I no longer believe that there is a thermal problem.  The
>> vid card's fan-blown heatsink is warm (not hot) to the touch; the
>> northbridge's fan-blown heatsink is warm (not hot) to the touch.
>>
>> (Some people commute to white-collar jobs in heavy pickups; I drive a
>> small server as my PC.  No chrome pipes.)
>>
>> So: what should I do next?  Should I set the system up to go to the
>> kernel debugger on panic, or even start it via the kernel debugger?
>> (Where is the full documentation?)  Should I shell out for an even
>> bigger power supply?  Is there another log that I should examine?
>> A restart wire that I should check?  A power bus I should scope?
>> (I'll have to borrow a scope somewhere.)  Is it time for an exorcist?
>>
>> Thanks for your help.
>>
>>     Mark Terribile
>>
>>
> Mark
>
> In my  opinion this is a thermal problem. I have seen this before in
> some of my systems. Mainly has to do witht he processors not cooling
> well enough. Try opening the cases up and leaving the the covers off for
> a temp solution. Are you over clocking?
>
> Bruce
>
> _______________________________________________
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to
> "[EMAIL PROTECTED]"
>


_______________________________________________
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to