Re: [indiana-discuss] spontaneos shutdowns and log messages

Brian Ruthven - Sun UK Wed, 01 Apr 2009 02:04:32 -0700


Hi Harry,



The usual clue I look for is:

Mar 18 17:21:56 opensol genunix: [ID 672855 kern.notice] syncing filesystems...

Mar 18 17:21:57 opensol genunix: [ID 904073 kern.notice]  done

Mar 18 17:22:50 opensol genunix: [ID 540533 kern.notice] ^MSunOS Release5.11 Version snv_110 64-bit

This at usually means the kernel was instructed to quit, and did sorelatively cleanly (at least syncing filesystems before resetting thehost and rebooting). This could be due to halt, reboot, shutdown, init,but also uadmin command (or uadmin(2) syscall into the kernel). It canalso be due to a panic.

Look at the lines immediately above it, and you will hopefully findsomething like this:


Mar 18 17:21:50 opensol syslogd: going down on signal 15

Mar 18 17:21:51 opensol rpcbind: [ID 564983 daemon.error] rpcbindterminating on signal.

Again, a clue that a clean shutdown was issued rather than panic orpower loss. Other clues include lines like:

Apr 1 09:49:35 lab662 reboot: [ID 330035 auth.crit] initiated by rooton /dev/console

This was the /usr/sbin/reboot command. A similar line is printed for/usr/sbin/halt.

Also, the output from "last -10 reboot" will show roughly what time thesystem went down (in case you didn't already know). This should beaccurate to within 60 seconds in the case of an unclean shutdown (e.g. acrash, power loss, or reset button). Sadly, it cannot distinguishbetween a clean or unclean shutdown at present.


This isn't an exhaustive list of things, but hopefully a starting point.

Other big clues are warnings in /var/adm/messages relating totemperature, or the string "panic", usually accompanied by a line onreboot along the lines of "reboot after panic...", and the generation ofa large vmcore.X file in /var/crash/<hostname>.

The time-slider messages are probably not relevant to this issue (butthere have been bugs against time-slider in the past...).


Hope that helps move you forward.
Brian



Harry Putnam wrote:

setup:
Athlon64 2.2ghz 3400+ - AK86-L Aopen mobo (Topped out at 3gb ram)
4 500gb IDE drives on IDE controllers

2 750gb SATA drives on PCI sata controller(adaptec 1205sa [Sil3112a chip])

Currently: osol-2008.11 build 110
=====     *     =====     *     =====     *     =====

First, this is not something that I can say is related to build 110.

It was going on before I upgraded from 109.

I'm experiencing spontaneos shutdowns and am not finding anything in
the logs /var/adm/messages or /var/log/syslog that I recognize as
being a clue to why.

I can post an extract including the time frame of shutdown but to me
it looks totally normal... (I'm not experienced in debugging though)

Also I'm not really sure where to look for clues beyond
/var/log/syslog and /var/adm/messages.

I've got a hunch this may be about hdd overheating, but is only
because I feel what seem to me to be abnormal heat when I touch

drives. Especially 2 sata drives on an

  Adaptec 1205sa (Sil3112achip).

It may be normal heat... I'm not sure... but I really have no idea
what else might provoke a shutdown ... not really sure overheating of
hdd would do that (force a shutdown).

The biggest change I've made most recently was to upgrade the size of
a mirrored 200gb pool to 750gb.  Those drives are on the sata
controller referenced above.  But the 200gb had been running on that
controller for some time.

I also had to flash the bios of that controller during the upgrade, to
make it recognize the new 750gb Sata II drives.

I don't remember seeing a spontaneous shutdwon before making those

changes.

However, I am getting some errors from something to do with the
timeslider mechanism.  I see them on boot up from the `startd'
service.  Where

  svc:/application/time-slider:default

is moved to maintenance by request of time-slider `frequent' and
`hourly' services.

Attempting to restart time-slider service results it being moved to
`Maintenance mode' again.   The `frequent' timeslider service is not

finding a crontab according to that services log.

That sounds like some kind of permissions problem and not something
that would invoke a shutdown.

I guess that might be related to the shutdowns though, so inlined the
output of `svcs -vx' below:

If that isn't it, where else should I look for clues, and are there
other logs I should be examining?

svcs -vx:
  svc:/system/filesystem/zfs/auto-snapshot:frequent (ZFS auto snap..)
   State: maintenance since Tue Mar 31 12:20:19 2009

Reason: Maintenance requested by

  "svc:/system/filesystem/zfs/auto-snapshot:frequent"

See: /var/svc/log/system-filesystem-zfs-auto-sn..:frequent.log

     See: http://sun.com/msg/SMF-8000-R4
     See: /var/svc/log/system-filesystem-zfs-auto-sn..:frequent.log
  Impact: 1 dependent service is not running:
          svc:/application/time-slider:default

svc:/system/filesystem/zfs/auto-snapshot:hourly (ZFS auto sn..)State: maintenance since Tue Mar 31 12:20:17 2009Reason: Maintenance requested by

     "svc:/system/filesystem/zfs/auto-snapshot:hourly" See:
     /var/svc/log/system-filesystem-zfs-auto-snapshot:hourly.log see:
     http://sun.com/msg/SMF-8000-R4 See:
     /var/svc/log/system-filesystem-zfs-auto-snapshot:hourly.log
     1 dependent service is not running:
     svc:/application/time-slider:default

_______________________________________________
indiana-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/indiana-discuss


--
Brian Ruthven                                        Sun Microsystems UK
Solaris Revenue Product Engineering             Tel: +44 (0)1252 422 312
Sparc House, Guillemont Park, Camberley, GU17 9QG

_______________________________________________
indiana-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/indiana-discuss

Re: [indiana-discuss] spontaneos shutdowns and log messages

Reply via email to