On Fri, Feb 25, 2011 at 7:33 AM, Dale <rdalek1...@gmail.com> wrote:
> Well, I think my machine is possessed or something.  I'm getting random
> reboots here.  When it does this, it is like hitting the reset button.  It
> is sitting on the grub screen when it does this.  I noticed the first time
> the other day and this was before adding the extra memory.  I seemed to be
> stable at 4Gbs but I seem to be rebooting at random.  I ran memtest
> yesterday, it checked fine.  It didn't find a error but it looked like it
> was only testing part of it.  Memtest recognizes all 16Gbs on the last run
> but it didn't seem to be testing it all.  Is there a trick to getting it to
> test the whole thing?
>
> This is the last few lines from messages before the reboot:
>
> Feb 25 05:10:01 localhost cron[5697]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 05:14:47 localhost smartd[3902]: Device: /dev/sdb [SAT], SMART Usage
> Attribute: 194 Temperature_Celsius changed from 113 to 112
> Feb 25 05:14:47 localhost smartd[3902]: Device: /dev/sdc [SAT], SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 80 to 78
> Feb 25 05:14:47 localhost smartd[3902]: Device: /dev/sdc [SAT], SMART Usage
> Attribute: 194 Temperature_Celsius changed from 75 to 74
> Feb 25 05:20:01 localhost cron[5850]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 05:30:01 localhost cron[5994]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 05:40:01 localhost cron[6136]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 05:41:49 localhost uptimed: moving up to position 20: 0 days,
> 01:27:23
> Feb 25 05:44:47 localhost smartd[3902]: Device: /dev/sdc [SAT], SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 78 to 77
> Feb 25 05:50:01 localhost cron[6284]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 05:59:01 localhost cron[6413]: (root) CMD (rm -f
> /var/spool/cron/lastrun/cron.hourly)
> Feb 25 06:00:01 localhost cron[6429]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 06:10:01 localhost cron[6573]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 06:14:47 localhost smartd[3902]: Device: /dev/sdc [SAT], SMART Usage
> Attribute: 190 Airflow_Temperature_Cel changed from 77 to 76
> Feb 25 06:20:01 localhost cron[6722]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 06:30:01 localhost cron[6865]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 06:40:01 localhost cron[7008]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 06:50:01 localhost cron[7156]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 06:59:01 localhost cron[7286]: (root) CMD (rm -f
> /var/spool/cron/lastrun/cron.hourly)
> Feb 25 07:00:01 localhost cron[7301]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 07:10:01 localhost cron[7444]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 07:20:01 localhost cron[7592]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 07:30:01 localhost cron[7741]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 07:40:01 localhost cron[7884]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
> Feb 25 07:42:49 localhost uptimed: moving up to position 19: 0 days,
> 03:28:23
> Feb 25 07:50:01 localhost cron[8032]: (root) CMD (test -x
> /usr/sbin/run-crons && /usr/sbin/run-crons )
>
> I don't see anything out of the norm, do you?  What else should I check?  I
> have a Gigabyte mobo, anything in the BIOS I should check?  After I added
> the last two sticks of ram, I loaded the optimized settings.  No
> overclocking or anything here.
>
> It does this while logged into KDE and after running a while.  I have shut
> down folding and the CPU is running below 85F and all the fans are running
> fine.  I don't think this could be a heat issue.  It's a Cooler Master HAF
> 932 case with lots of cooling.
>
> I'm going to reboot and let memtest run a while and see exactly what it was
> that makes me think it is not testing ALL the memory.
>
> Thanks.
>
> Dale
>
> :-)  :-)

Is folding pretty CPU intensive? If it is then possibly shut that off
completely until you find the root cause. Additional CPU heating can
cause higher temps all through the machine. If you have a broken trace
somewhere that only comes apart when the motherboard heats up, etc.

The order I walk through this sort of problem is:

1) Google, Google, Google for your exact hardware looking for similar
problems. (and hopefully solutions...) The main culprits are
generally:
- Motherboard
- Power supply
- VGA

2) Unlikely if this is your new machine but use some canned air and
blow out all heat sinks if they have collected dust.

3) Remove _ALL_ adapter cards and any external devices that you don't
absolutely need for testing. Run for a number of hours or days.

If you are still rebooting then consider changing your power supply
first. What sort of supply are you using now? Does it have _more_ than
power for your machine?

I hope you find it soon. This can be very frustrating. (From experience...)

Good luck,
Mark

Reply via email to