Incidentally I revisited this issue just last week when trying to nail down another issue. The result is that I still have no clue why in the hell is wrong with amd64 SMP on this box. It feels like a stack overflow or a botched dma or memory corruption or or or.
I did test i386 on it and that seemed to work ok but I did not run it for more than a few builds. amd64 UP seems fine too. These machines are of questionable quality. Theo has one that will crash just sitting at the boot prompt. On Sat, Dec 05, 2009 at 10:02:50PM -0500, Daniel Ouellet wrote: > This is an old issue and not new, but I tried the latest snapshot in > case the situation have changed to no avail. > > I git a little bit more details however after letting it reboot > constantly may be 40 times or so. > > Then it jam and was able to get a screen shut of the remote console > before forcing it to reboot and here is what i got. Hopefully it will be > more useful and yes I can't do ps, or ddb as it is totally jam, or > simply reboot constantly, always at the same place. > > See the console output, screen shut if you want to see it here and the > dmesg below as well from the amd64 single kernel bot as I can't get it > with the mp kernel. > > I wish I could provide more, but I can't. No console, no ps, no ddb, > nothing is possible pass this point here. I only was able to get this > much twice be letting it reboot constantly for about 45 minutes before > it jam again at the same stage so that I can get a screen shut of it to > type it below. > > The real screen shut is also available here > > http://openbsdsupport.org/images/sun4100.png > > if you want to see it, but that's the same as I type below as I copy it > from the screen shut I was able to capture in the process when it > actually didn't reboot constantly, but jam for good. > > No issue with the i386 kernel, or the i386.mp, nor with the amd64, only > the amd64.mp kernel does this problem and is reproduceable at will. > > > Not sure what else I could provide to help isolate this, but if > anything, I would be more then happy to do so. > > Best, > > Daniel > > > ========== > Console output in free mode retype as seen on the console when crash and > frozen and need to be unfrozen by doing a hard reset. > > ...... > Automatic boot in progress: starting file system checks. > /dev/rsd0a: file system is clean; not checking > kernel:uvm_f kernel: kernel: protection fault trap, code=0 > Stopped at Xintr_legacy7+0x24d: iret > ddb{2}> kernel: privileged instruction fault trap, code=0 > Faulted in DDB; continuing... > > > > > =========================== > dmesg > > OpenBSD 4.6-current (GENERIC) #6: Fri Dec 4 22:47:14 MST 2009 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC > real mem = 3756982272 (3582MB) > avail mem = 3650658304 (3481MB) > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xfbd50 (70 entries) > bios0: vendor American Megatrends Inc. version "0ABJX039" date 04/11/2007 > bios0: Sun Microsystems Sun Fire X4100 M2 > acpi0 at bios0: rev 2 > acpi0: tables DSDT FACP APIC SPCR SLIT OEMB HPET IPET SRAT SSDT > acpi0: wakeup devices PS2K(S1) PS2M(S1) USB0(S4) USB1(S4) MAC_(S5) > P0P1(S4) P0P2(S4) P0P3(S4) P0P4(S4) P0P5(S4) IO4B(S4) BR5B(S4) BR5C(S4) > BR5D(S4) BR5E(S4) IOB2(S4) BR2B(S4) BR2C(S4) BR2D(S4) BR2E(S4) PWRB(S1) > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Dual-Core AMD Opteron(tm) Processor 2216, 2393.93 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,CX16,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW > cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB > 64b/line 16-way L2 cache > cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative > cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative > cpu0: apic clock running at 199MHz > cpu at mainbus0: not configured > cpu at mainbus0: not configured > cpu at mainbus0: not configured > ioapic0 at mainbus0: apid 15 pa 0xfec00000, version 11, 24 pins > ioapic1 at mainbus0: apid 16 pa 0xfeafd000, version 11, 7 pins > ioapic1: misconfigured as apic 0, can't remap to apid 16 > ioapic2 at mainbus0: apid 17 pa 0xfeafc000, version 11, 7 pins > ioapic2: misconfigured as apic 1, can't remap to apid 17 > ioapic3 at mainbus0: apid 14 pa 0xfeaff000, version 11, 24 pins > acpihpet0 at acpi0: 25000000 Hz > acpiprt0 at acpi0: bus 0 (PCI0) > acpiprt1 at acpi0: bus 1 (P0P1) > acpiprt2 at acpi0: bus 4 (P0P4) > acpiprt3 at acpi0: bus 5 (P0P5) > acpiprt4 at acpi0: bus 128 (PCIB) > acpiprt5 at acpi0: bus 133 (POGA) > acpiprt6 at acpi0: bus 134 (POGB) > acpiprt7 at acpi0: bus 131 (BR5D) > acpiprt8 at acpi0: bus 132 (BR5E) > acpicpu0 at acpi0: PSS > acpibtn0 at acpi0: PWRB > ipmi at mainbus0 not configured > cpu0: PowerNow! K8 2393 MHz: speeds: 2400 2200 2000 1800 1000 MHz > pci0 at mainbus0 bus 0 > "NVIDIA nForce4 DDR" rev 0xa3 at pci0 dev 0 function 0 not configured > pcib0 at pci0 dev 1 function 0 "NVIDIA nForce4 ISA" rev 0xa3 > nviic0 at pci0 dev 1 function 1 "NVIDIA nForce4 SMBus" rev 0xa2 > iic0 at nviic0 > spdmem0 at iic0 addr 0x52: 1GB DDR2 SDRAM registered cmd/addr parity, > data ECC PC2-5300CL5 > spdmem1 at iic0 addr 0x53: 1GB DDR2 SDRAM registered cmd/addr parity, > data ECC PC2-5300CL5 > iic1 at nviic0 > iic1: addr 0x18 00=01 01=01 02=00 03=00 words 00=0101 01=0101 02=0000 > 03=0000 04=ffff 05=ffff 06=ffff 07=ffff > iic1: addr 0x19 00=01 01=00 02=00 03=01 words 00=0101 01=0000 02=0000 > 03=0101 04=ffff 05=ffff 06=ffff 07=ffff > iic1: addr 0x1a 02=00 03=00 words 00=ffff 01=ffff 02=0000 03=0000 > 04=ffff 05=ffff 06=ffff 07=ffff > iic1: addr 0x1c 02=00 03=00 words 00=ffff 01=ffff 02=0000 03=0000 > 04=ffff 05=ffff 06=ffff 07=ffff > iic1: addr 0x1d 00=0f 01=0f 02=00 03=00 words 00=0f0f 01=0f0f 02=0000 > 03=0000 04=ffff 05=ffff 06=ffff 07=ffff > iic1: addr 0x1e 01=07 02=00 03=f8 words 00=ffff 01=0707 02=0000 03=f8f8 > 04=ffff 05=ffff 06=ffff 07=ffff > admcts0 at iic1 addr 0x2c > admcts1 at iic1 addr 0x2d > iic1: addr 0x48 00=ff 01=00 02=4b 08=1a 09=00 0a=4b 0f=ff 10=1a 11=00 > 12=4b 18=1a 19=00 1a=4b 20=1a 21=00 22=4b 28=1a 29=00 2a=4b 30=1a 31=00 > 32=4b 38=1a 39=00 3a=4b 3e=1a 40=1a 41=00 42=4b 48=1a 49=00 4a=4b 4e=1a > 50=1a 51=00 52=4b 58=1a 59=00 5a=4b 60=1a 61=00 62=4b 68=1a 69=00 6a=4b > 70=1a 71=00 72=4b 78=1a 79=00 7a=4b 80=1a 81=00 82=4b 88=1a 89=00 8a=4b > 90=1a 91=00 92=4b 98=1a 99=00 9a=4b a0=1a a1=00 a2=4b a8=1a a9=00 aa=4b > b0=1a b1=00 b2=4b b8=1a b9=00 ba=4b c0=1a c1=00 c2=4b c8=1a c9=00 ca=4b > d0=1a d1=00 d2=4b d8=1a d9=00 da=4b e0=1a e1=00 e2=4b e8=1a e9=00 ea=4b > f0=1a f1=00 f2=4b f8=1a f9=00 fa=4b fc=4b fd=4b fe=1a ff=4b words > 00=1a7f 01=00ff 02=4b7f 03=507f 04=507f 05=507f 06=507f 07=507f > iic1: addr 0x49 00=1a 01=ff 02=ff 03=50 05=ff 06=ff 07=50 08=1a 09=00 > 0a=4b 10=1a 11=00 12=4b 18=1a 19=00 1a=4b 20=1a 21=00 22=4b 28=1a 29=00 > 2a=4b 30=1a 31=00 32=4b 38=1a 39=00 3a=4b 3e=1a 40=1a 41=00 42=4b 48=1a > 49=00 4a=4b 4e=1a 50=1a 51=00 52=4b 58=1a 59=00 5a=4b 60=1a 61=00 62=4b > 68=1a 69=00 6a=4b 70=1a 71=00 72=4b 78=1a 79=00 7a=4b 80=1a 81=00 82=4b > 88=1a 89=00 8a=4b 90=1a 91=00 92=4b 98=1a 99=00 9a=4b a0=1a a1=00 a2=4b > a8=1a a9=00 aa=4b b0=1a b1=00 b2=4b b8=1a b9=00 ba=4b c0=1a c1=00 c2=4b > c8=1a c9=00 ca=4b d0=1a d1=00 d2=4b d8=1a d9=00 da=4b e0=1a e1=00 e2=4b > e8=1a e9=00 ea=4b f0=1a f1=00 f2=4b f8=1a f9=00 fa=4b fe=1a words > 00=1a7f 01=00ff 02=4b7f 03=507f 04=507f 05=507f 06=507f 07=507f > ohci0 at pci0 dev 2 function 0 "NVIDIA nForce4 USB" rev 0xa2: apic 15 > int 11 (irq 11), version 1.0, legacy support > ehci0 at pci0 dev 2 function 1 "NVIDIA nForce4 USB" rev 0xa3: apic 15 > int 5 (irq 5) > usb0 at ehci0: USB revision 2.0 > uhub0 at usb0 "NVIDIA EHCI root hub" rev 2.00/1.00 addr 1 > pciide0 at pci0 dev 6 function 0 "NVIDIA nForce4 IDE" rev 0xf2: DMA, > channel 0 configured to compatibility, channel 1 configured to > compatibility > atapiscsi0 at pciide0 channel 0 drive 0 > scsibus0 at atapiscsi0: 2 targets > cd0 at scsibus0 targ 0 lun 0: <TEAC, DW-224SL-R, 1.0A> ATAPI 5/cdrom > removable > cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 > pciide0: channel 1 ignored (disabled) > ppb0 at pci0 dev 9 function 0 "NVIDIA nForce4 PCI-PCI" rev 0xa2 > pci1 at ppb0 bus 1 > vga1 at pci1 dev 3 function 0 "ATI Rage XL" rev 0x27 > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) > wsdisplay0: screen 1-5 added (80x25, vt100 emulation) > nfe0 at pci0 dev 10 function 0 "NVIDIA CK804 LAN" rev 0xa3: apic 15 int > 15 (irq 15), address 00:14:4f:7d:9a:8e > eephy0 at nfe0 phy 1: 88E1111 Gigabit PHY, rev. 2 > ppb1 at pci0 dev 11 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci2 at ppb1 bus 2 > ppb2 at pci0 dev 12 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci3 at ppb2 bus 3 > ppb3 at pci0 dev 13 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci4 at ppb3 bus 4 > ppb4 at pci0 dev 14 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci5 at ppb4 bus 5 > pchb0 at pci0 dev 24 function 0 "AMD AMD64 0Fh HyperTransport" rev 0x00 > pchb1 at pci0 dev 24 function 1 "AMD AMD64 0Fh Address Map" rev 0x00 > pchb2 at pci0 dev 24 function 2 "AMD AMD64 0Fh DRAM Cfg" rev 0x00 > kate0 at pci0 dev 24 function 3 "AMD AMD64 0Fh Misc Cfg" rev 0x00: core > rev JH-F2 > pchb3 at pci0 dev 25 function 0 "AMD AMD64 0Fh HyperTransport" rev 0x00 > pci6 at pchb3 bus 128 > "NVIDIA nForce4 DDR" rev 0xa3 at pci6 dev 0 function 0 not configured > "NVIDIA CK804" rev 0xa3 at pci6 dev 1 function 0 not configured > nfe1 at pci6 dev 10 function 0 "NVIDIA CK804 LAN" rev 0xa3: apic 15 int > 7 (irq 7), address 00:14:4f:7d:9a:8f > eephy1 at nfe1 phy 1: 88E1111 Gigabit PHY, rev. 2 > ppb5 at pci6 dev 11 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci7 at ppb5 bus 129 > ppb6 at pci6 dev 12 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci8 at ppb6 bus 130 > ppb7 at pci6 dev 13 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci9 at ppb7 bus 131 > ppb8 at pci6 dev 14 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci10 at ppb8 bus 132 > ppb9 at pci6 dev 16 function 0 "AMD 8132 PCIX" rev 0x12 > pci11 at ppb9 bus 133 > "AMD 8132 PCIX IOAPIC" rev 0x12 at pci6 dev 16 function 1 not configured > ppb10 at pci6 dev 17 function 0 "AMD 8132 PCIX" rev 0x12 > pci12 at ppb10 bus 134 > em0 at pci12 dev 1 function 0 "Intel PRO/1000MT (82546EB)" rev 0x03: > apic 17 int 0 (irq 10), address 00:14:4f:7d:9a:90 > em1 at pci12 dev 1 function 1 "Intel PRO/1000MT (82546EB)" rev 0x03: > apic 17 int 1 (irq 6), address 00:14:4f:7d:9a:91 > mpi0 at pci12 dev 2 function 0 "Symbios Logic SAS1064" rev 0x02: apic 17 > int 2 (irq 7) > scsibus1 at mpi0: 108 targets > sd0 at scsibus1 targ 2 lun 0: <LSILOGIC, Logical Volume, 3000> SCSI2 > 0/direct fixed > sd0: 69618MB, 512 bytes/sec, 142577664 sec total > "AMD 8132 PCIX IOAPIC" rev 0x12 at pci6 dev 17 function 1 not configured > pchb4 at pci0 dev 25 function 1 "AMD AMD64 0Fh Address Map" rev 0x00 > pchb5 at pci0 dev 25 function 2 "AMD AMD64 0Fh DRAM Cfg" rev 0x00 > kate1 at pci0 dev 25 function 3 "AMD AMD64 0Fh Misc Cfg" rev 0x00: core > rev JH-F2 > isa0 at pcib0 > isadma0 at isa0 > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > pckbc0 at isa0 port 0x60/5 > pcppi0 at isa0 port 0x61 > midi0 at pcppi0: <PC speaker> > spkr0 at pcppi0 > usb1 at ohci0: USB revision 1.0 > uhub1 at usb1 "NVIDIA OHCI root hub" rev 1.00/1.00 addr 1 > mtrr: Pentium Pro MTRR support > uhidev0 at uhub1 port 3 configuration 1 interface 0 "American Megatrends > Inc. Virtual Keyboard and Mouse" rev 1.10/1.00 addr 2 > uhidev0: iclass 3/1 > ukbd0 at uhidev0: 8 modifier keys, 6 key codes > wskbd0 at ukbd0: console keyboard, using wsdisplay0 > uhidev1 at uhub1 port 3 configuration 1 interface 1 "American Megatrends > Inc. Virtual Keyboard and Mouse" rev 1.10/1.00 addr 2 > uhidev1: iclass 3/1 > ums0 at uhidev1 > ums0: X report 0x0002 not supported > umass0 at uhub1 port 4 configuration 1 interface 0 "American Megatrends > Inc. Virtual Cdrom Device" rev 1.10/1.00 addr 3 > umass0: using ATAPI over Bulk-Only > scsibus2 at umass0: 2 targets, initiator 0 > cd1 at scsibus2 targ 1 lun 0: <AMI, Virtual CDROM, 1.00> ATAPI 5/cdrom > removable > umass1 at uhub1 port 5 configuration 1 interface 0 "American Megatrends > Inc. Virtual Floppy Device" rev 1.10/1.00 addr 4 > umass1: using UFI over CBI with CCI > scsibus3 at umass1: 2 targets, initiator 0 > sd1 at scsibus3 targ 1 lun 0: <AMI, Virtual Floppy, 1.00> ATAPI 0/direct > removable > sd1: drive offline > vscsi0 at root > scsibus4 at vscsi0: 256 targets > softraid0 at root > sd1(umass1:1:0): Check Condition (error 0x70) on opcode 0x1b > SENSE KEY: Not Ready > ASC/ASCQ: Medium Not Present > root on sd0a swap on sd0b dump on sd0b