Stefan Castille wrote:
Hi list,

I got a panic (2nd time) in 2 month now since I upgraded to 3.9 on one of our servers: a sunfire v120 (openBSD 3.9 with latest patches, sparc64) . Since the server does not have a display, only a serial console I am not aware what was shown on the console before I connected. All I can give is the ps and trace from ddb and a dmesg. I hope that is enough to get some idea what is wrong. As I have no experience with the debugger I have no idea where to even start looking. As there seems to be a lot of sensor stuff in the trace I wonder if its the hardware that is failing or another problem.

ps
   PID   PPID   PGRP    UID  S       FLAGS  WAIT       COMMAND
 21477   6436  22315      0  3      0x4084  nanosleep  sleep
  1516  17828  17828      0  3     0x40184  select     sendmail
 31797  17828  17828      0  3     0x40184  select     sendmail
 11321  22315  22315      0  3      0x4086  piperd     tee
  6436  22315  22315      0  3        0x86  pause      sh
 22315      1  22315      0  3      0x4086  pause      sh
 10494      1  10494      0  3       0x184  select     inetd
 25072  27544  27544      0  3    0x800086  netio      tcpdump
 27544      1  27544     76  3    0x805186  bpf        tcpdump
 17828      1  17828      0  3     0x40184  select     sendmail
 22083      1  22083      0  3      0x4086  ttyin      getty
  9815      1   9815      0  3        0x84  select     cron
 21519   7023   7023      0  3        0x85  lockf      saslauthd
 31649   7023   7023      0  3        0x85  lockf      saslauthd
  8373   7023   7023      0  3        0x85  lockf      saslauthd
  1000   7023   7023      0  3        0x85  netcon     saslauthd
  7023      1   7023      0  3        0x85  lockf      saslauthd
 18524      1  16150      0  3        0x84  bpf        arpwatch
 20703      1  16150      0  3        0x84  bpf        arpwatch
 16997      1  16150      0  3        0x84  bpf        arpwatch
 16442      1  16150      0  3        0x84  bpf        arpwatch
 11694      1  11694      0  3        0x84  select     sshd
  5084  21027  21027     83  3       0x184  poll       ntpd
 21027      1  21027      0  3        0x84  poll       ntpd
  2825  25210  25210     68  3       0x184  select     isakmpd
 25210      1  25210      0  3        0x84  netio      isakmpd
   809    553    553     70  3       0x184  select     named
   553      1    553      0  3       0x184  netio      named
 32082  26556  26556     74  3       0x184  bpf        pflogd
 26556      1  26556      0  3        0x84  netio      pflogd
  1621   2267   2267     73  2       0x184             syslogd
  2267      1   2267      0  3        0x84  netio      syslogd
    13      0      0      0  3    0x100204  crypto_wa  crypto
    12      0      0      0  3    0x100204  aiodoned   aiodoned
    11      0      0      0  3    0x100204  syncer     update
    10      0      0      0  3    0x100204  cleaner    cleaner
     9      0      0      0  3    0x100204  reaper     reaper
     8      0      0      0  3    0x100204  pgdaemon   pagedaemon
     7      0      0      0  3    0x100204  pftm       pfpurge
     6      0      0      0  3    0x100204  usbevt     usb1
*    5      0      0      0  7    0x100204             sensors
     4      0      0      0  3    0x100204  usbtsk     usbtask
     3      0      0      0  3    0x100204  usbevt     usb0
     2      0      0      0  3    0x100204  kmalloc    kmthread
     1      0      1      0  3      0x4084  wait       init
     0     -1      0      0  3     0x80204  scheduler  swapper






ddb> trace
data_access_error(26f639c0, 400, 1fe02004000, 84000000, 4e923028, 0) at data_ac
cess_error+0x1ac
trapbase(ffffffffffffffff, 5e65bfd7b632f, 5e65bfd7b6335, 6, ffffffffffffffff, c
8a78) at trapbase+0x87c4
alipm_smb_exec(2b00e00, 1, 18, 0, 8, 26f63d4e) at alipm_smb_exec+0x1b8
iic_exec(2b00e58, 1, 18, 26f63d4f, 1, 26f63d4e) at iic_exec+0x1d8
admtemp_refresh(2ae1000, 2, 0, 452b849e, 0, 1800) at admtemp_refresh+0x58
sensor_task_thread(cfa8980, 2b01400, 0, 1388, 26f5bbc0, 2b011b0) at sensor_task
_thread+0x12c
proc_trampoline(0, 0, 0, 0, 0, 0) at proc_trampoline+0x4
ddb>





(before anyone complains, this is the default kernel config, the only change is the NAME of the config during build,
it is just a legacy from an update script I inherited from my predecessor.)

console is /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],3f8
Copyright (c) 1982, 1986, 1989, 1991, 1993
        The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2006 OpenBSD. All rights reserved. http://www.OpenBSD.org

OpenBSD 3.9-stable (argon) #0: Tue Oct  3 10:34:26 CEST 2006
    [EMAIL PROTECTED]:/usr/obj/argon
total memory = 1073741824
avail memory = 969277440
using 6553 buffers containing 53682176 bytes of memory
bootpath: /[EMAIL PROTECTED],0/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0/[EMAIL 
PROTECTED],0
mainbus0 (root): Sun Fire V120 (UltraSPARC-IIe 648MHz)
cpu0 at mainbus0: SUNW,UltraSPARC-IIe @ 648 MHz, version 0 FPU
cpu0: physical 32K instruction (32 b/l), 16K data (32 b/l), 2048K external (64 b/l)
psycho0 at mainbus0
SUNW,sabre: impl 0, version 0: ign 7c0 bus range 0 to 3; PCI bus 0
DVMA map: c0000000 to e0000000
IOTDB: 4d0a000 to 4d8a000
pci0 at psycho0
ppb0 at pci0 dev 1 function 1 "Sun Simba PCI-PCI" rev 0x13
pci1 at ppb0 bus 1
ebus0 at pci1 dev 12 function 0 "Sun PCIO Ebus2 (US III)" rev 0x01
"flashprom" at ebus0 addr 0-fffff not configured
clock1 at ebus0 addr 0-1fff: mk48t59: hostid 836cc16c
ebus_attach: idprom: incomplete
"SUNW,lomh" at ebus0 addr 200000-200003 ipl 42 not configured
gem0 at pci1 dev 12 function 1 "Sun ERI Ether" rev 0x01: ivec 3006, address 00:03:ba:6c:c1:6c
bmtphy0 at gem0 phy 1: BCM5221 100baseTX PHY, rev. 4
ohci0 at pci1 dev 12 function 3 "Sun USB" rev 0x01: ivec 24, version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: Sun OHCI root hub, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
alipm0 at pci1 dev 3 function 0 "Acer Labs M7101 Power" rev 0x00: 74KHz clock
iic0 at alipm0
admtemp0 at iic0 addr 0x18: max1617
"at34c02" at iic0 addr 0x54 not configured
"at34c02" at iic0 addr 0x55 not configured
"at24c64" at iic0 addr 0x50 not configured
"at24c64" at iic0 addr 0x51 not configured
ebus1 at pci1 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
"power" at ebus1 addr 2000-2007 ipl 37 not configured
com0 at ebus1 addr 3f8-3ff ipl 43: ns16550a, 16 byte fifo
com0: console
com1 at ebus1 addr 2e8-2ef ipl 43: ns16550a, 16 byte fifo
pciide0 at pci1 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc3: DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 180c for native-PCI interrupt
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
gem1 at pci1 dev 5 function 1 "Sun ERI Ether" rev 0x01: ivec 301c, address 00:03:ba:6c:c1:6d
bmtphy1 at gem1 phy 1: BCM5221 100baseTX PHY, rev. 4
ohci1 at pci1 dev 5 function 3 "Sun USB" rev 0x01: ivec 26, version 1.0, legacy support
usb1 at ohci1: USB revision 1.0
uhub1 at usb1
uhub1: Sun OHCI root hub, rev 1.00/1.00, addr 1
uhub1: 4 ports with 4 removable, self powered
ppb1 at pci0 dev 1 function 0 "Sun Simba PCI-PCI" rev 0x13
pci2 at ppb1 bus 2
siop0 at pci2 dev 8 function 0 "Symbios Logic 53c896" rev 0x07: ivec 1820, using 8K of on-board RAM
scsibus0 at siop0: 16 targets
sd0 at scsibus0 targ 0 lun 0: <FUJITSU, MAP3735N SUN72G, 0401> SCSI4 0/direct fixed sd0: 70007MB, 14100 cyl, 24 head, 423 sec, 512 bytes/sec, 143374738 sec total sd1 at scsibus0 targ 1 lun 0: <FUJITSU, MAP3367N SUN36G, 0401> SCSI4 0/direct fixed
sd1: 34732MB, 24622 cyl, 27 head, 107 sec, 512 bytes/sec, 71132959 sec total
siop1 at pci2 dev 8 function 1 "Symbios Logic 53c896" rev 0x07: ivec 1820, using 8K of on-board RAM
scsibus1 at siop1: 16 targets
ppb2 at pci2 dev 5 function 0 "DEC 21152 PCI-PCI" rev 0x03
pci3 at ppb2 bus 3
dc0 at pci3 dev 4 function 0 "DEC 21142/3" rev 0x41: ivec 15, address 00:03:ba:6c:c1:6c
nsphyter0 at dc0 phy 1: DP83843 10/100 PHY, rev. 0
dc1 at pci3 dev 5 function 0 "DEC 21142/3" rev 0x41: ivec 5, address 00:03:ba:6c:c1:6c
nsphyter1 at dc1 phy 1: DP83843 10/100 PHY, rev. 0
dc2 at pci3 dev 6 function 0 "DEC 21142/3" rev 0x41: ivec 14, address 00:03:ba:6c:c1:6c
nsphyter2 at dc2 phy 1: DP83843 10/100 PHY, rev. 0
dc3 at pci3 dev 7 function 0 "DEC 21142/3" rev 0x41: ivec 4, address 00:03:ba:6c:c1:6c
nsphyter3 at dc3 phy 1: DP83843 10/100 PHY, rev. 0
pcons at mainbus0 not configured
No counter-timer -- using %tick at 648MHz as system clock.
root on sd0a
siop0: target 0 now using tagged 16 bit 40.0 MHz 31 REQ/ACK offset xfers
rootdev=0x700 rrootdev=0x1100 rawdev=0x1102
WARNING: / was not properly unmounted
WARNING: clock gained 6 days -- CHECK AND RESET THE DATE!

[demime 1.01d removed an attachment of type APPLICATION/DEFANGED which had a 
name of stefan.castille.16896DEFANGED-vcf]

Hi,

I am having the exact same crash on my sunfire V120 with 3.9. The system ran for months without any problems, then has randomly had problems. I am patiently waiting for OpenBSD 4.0 to come out in the hopes that it will resolve the problems.

I did look into this a bit.

man(4) alipm:

HISTORY
    The alipm driver first appeared in OpenBSD 3.9.

and in /usr/src/sys/dev/pci/alipm.c (3.9):

#ifdef __sparc64__
       /*
        * XXX We get data_access_error exceptions on Blade 100 and
        * Blade 150 machines with 233KHz clock.  We should
        * investigate wether changing the clock speed to 74KHz fixes
        * the problem.
        */
       if ((reg & ALIPM_SMB_HOSTC_CLOCK) != ALIPM_SMB_HOSTC_74K) {
               printf(", disabling to avoid hardware failure\n");
               return;
       }
#endif

And then from:
http://www.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/alipm.c

Sprinkle a few bus_space_barrier() calls.  Some of these may not be strictly
necessary, but they will help debugging of alipm(4) still messes up the bus
on sparc64.  Always enable on sparc64 again.

So, it appears that it is a known problem.

I did try to disable alipm, but having never used "config" do do runtime kernel changes, I'm not too confident I did it correctly.

Anyway, just thought I'd share what I found.

Cheers,
Steve Williams

Reply via email to