>> I'm having a problem with spontaneous restarts. This isn't a new >> problem, >> but I've done the obvious things and the problem hasn't gone away. I >> was thinking of asking on -hackers, but I'm trying here first.
I have a mail server that is doing this exact thing. Very spontaneous, more problematic when under heavy load. I tested this theory with buildworld, and it just barfs. I have tracked it down to a problem with the power supply. In my case, the fan is not operating properly (it is moving, but very slowly). I pop the disk in a new machine and voila...problem fixed. Just my $.02 sb >> >> The system is a 4.8 with a mix of patches and port upgrades of various >> ages. I'm planning to rebuild the whole thing, bringing it up to date, >> but I'm hoping to be able to wait for a 5.x in STABLE; I don't want to >> do >> this twice, since I expect I'll have to dump and restore everything. >> >> The hardware is a 2.6 GHz P4 with 2 GByte of GEIL dual-channel memory. >> (The problem existed on the previous, somewhat slower, memory as well.) >> The box contains the processor and motherboard (Gigabyte GA-SINXP1394), >> two floppy drives, CD and CD/W drives, an HP DAT, three IBM/Hitachi >> 36G/10K SCSI drives, and one 120G IDE. The SCSI card is by Adaptec; the >> video card is a low-end NVidia, and I'm running their video driver. The >> PS is an Antec True380, which should be enough for the box, with >> something >> to spare. There are several extra, large fans, of which more later. >> >> The system, monitor, printer, and cable modem are all powered through an >> APC BACK-UPS 450, about 18 months old. It's shown in the last week that >> it can keep things up for more than an hour. >> >> The symptom is a restart that leaves no indication of how it happened. >> >> Recently, the system shut down (completely, and at the power supply) >> instead of restarting. In that case, the last deliberate shutdown >> was a `shutdown -h now'; it appears that in every other case, the last >> deliberate shutdown was a `-r now'. (Question: does the machine >> architecture have settings for reset-resume .vs. reset-halt, settings >> that might be remembered when a later action occurs?) It has >> subsequently shut down with an immediate restart. >> >> There are no failure indications in the /var/log/messages, nor reported >> by dmesg. (The console scrolls by very quickly.) The message sequence >> over the restart typically looks like this: >> >> ======================================================================= >> Jun 7 18:39:09 moleend /kernel: arp: 24.228.64.1 moved from >> 00:05:00:e7:17:44 >> t >> o 00:05:00:e7:17:57 on em0 >> Jun 7 18:39:09 moleend /kernel: arp: 24.228.64.1 moved from >> 00:05:00:e7:17:57 >> t >> o 00:05:00:e7:17:44 on em0 >> Jun 7 18:59:06 moleend dhclient: New Network Number: 24.228.64.0 >> Jun 7 18:59:06 moleend dhclient: New Broadcast Address: 255.255.255.255 >> Jun 7 22:47:33 moleend /kernel: Copyright (c) 1992-2003 The FreeBSD >> Project. >> Jun 7 22:47:33 moleend /kernel: Copyright (c) 1979, 1980, 1983, 1986, >> 1988, >> 198 >> 9, 1991, 1992, 1993, 1994 >> ======================================================================== >> >> The restart most often occurs AFTER X has been shut down (and often >> restarted) but sometimes when X has not been run. It most often occurs >> when the system is under heavy CPU load, but sometimes when the load >> has been light. >> >> I thought at one time it might be a thermal problem and undertook to >> fix that. (I am still working to get more cooling air over the disks.) >> Right now, I have 120 mm fans rated at 130-135 CFM (Panaflow and JMC) >> pushing air in and out of the box, and pressurizing a duct feeding the >> CPU cooler, which is now cool to the touch. The memory modules are cool >> to the touch. While the disks need a proper plenum to route more air >> over them, I no longer believe that there is a thermal problem. The >> vid card's fan-blown heatsink is warm (not hot) to the touch; the >> northbridge's fan-blown heatsink is warm (not hot) to the touch. >> >> (Some people commute to white-collar jobs in heavy pickups; I drive a >> small server as my PC. No chrome pipes.) >> >> So: what should I do next? Should I set the system up to go to the >> kernel debugger on panic, or even start it via the kernel debugger? >> (Where is the full documentation?) Should I shell out for an even >> bigger power supply? Is there another log that I should examine? >> A restart wire that I should check? A power bus I should scope? >> (I'll have to borrow a scope somewhere.) Is it time for an exorcist? >> >> Thanks for your help. >> >> Mark Terribile >> >> > Mark > > In my opinion this is a thermal problem. I have seen this before in > some of my systems. Mainly has to do witht he processors not cooling > well enough. Try opening the cases up and leaving the the covers off for > a temp solution. Are you over clocking? > > Bruce > > _______________________________________________ > [EMAIL PROTECTED] mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to > "[EMAIL PROTECTED]" > _______________________________________________ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"