We have a Dell PowerEdge 750 running OpenBSD 3.7 that has been in
production for the last ~14 months without issue. Suddenly, it has
frozen each of the last two days. The only common denominator
appears to have been that both events happened while a user was
transferring a 100MB file via SFTP to their account on this server.
Note that all the home directories on this server are NFS mounted
from an Apple Xserve running OS X 10.3.9.
When the freeze occurs, the server can be pinged, but no services
will respond. Even the console will not respond to input. There is
no apparent kernel crash, and no way to force the ddb console. The
only recourse is a hard restart, at which point it has to rebuild the
RAIDframe parity on raid0. The only modification to the kernel was
enabling RAIDframe support.
Which brings me to the possible cause. I didn't realize until these
events that the swap partition (/dev/raid0b) was not being activated
at boot. Based on some archive threads, it seems that OpenBSD will
not utilize a raided swap partition (without jumping through some
hoops). It seems possible that the memory might have gotten
consumed, leading the system to hang while attempting to swap where
there was none available. I'm not a memory expert, so I won't begin
to theorize on whether this scenario is feasible or not. I've pulled
the server out of production and replaced it with a new OpenBSD 3.9
server while I continue diagnosis offline. All attempts to reproduce
this error condition, with swap disabled and enabled, have been
unsuccessful.
Any ideas? Dmesg below.
Thanks.
OpenBSD 3.7 (GENERIC) #0: Tue Jul 19 15:23:10 EDT 2005
[EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) 4 CPU 2.80GHz ("GenuineIntel" 686-class)
2.80 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,
CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,PNI,MWAIT,CNXT-ID
real mem = 536195072 (523628K)
avail mem = 481669120 (470380K)
using 4278 buffers containing 26910720 bytes (26280K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(00) BIOS, date 02/16/05, BIOS32 rev. 0 @
0xffe90
pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfc570/144 (7 entries)
pcibios0: no compatible PCI ICU found: ICU vendor 0x8086 product 0x25a1
pcibios0: Warning, unable to fix up PCI interrupt routing
pcibios0: PCI bus #3 is the last bus
bios0: ROM list: 0xc0000/0x8000 0xc8000/0x1000 0xc9000/0x5600
0xce800/0x1000 0xec000/0x4000!
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Intel 82875P Host" rev 0x02
ppb0 at pci0 dev 3 function 0 "Intel 82875P PCI-CSA" rev 0x02
pci1 at ppb0 bus 1
em0 at pci1 dev 1 function 0 "Intel PRO/1000CT (82547EI)" rev 0x00:
irq 3, address: 00:12:3f:25:49:a6
ppb1 at pci0 dev 28 function 0 "Intel 6300ESB PCIX" rev 0x02
pci2 at ppb1 bus 2
ahc1 at pci2 dev 1 function 0 "Adaptec AHA-3960D U160" rev 0x01: irq 11
scsibus0 at ahc1: 16 targets
sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST373207LW, D702> SCSI3 0/
direct fixed
sd0: 70007MB, 90774 cyl, 2 head, 789 sec, 512 bytes/sec, 143374650
sec total
sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST373207LW, D702> SCSI3 0/
direct fixed
sd1: 70007MB, 90774 cyl, 2 head, 789 sec, 512 bytes/sec, 143374650
sec total
ahc2 at pci2 dev 1 function 1 "Adaptec AHA-3960D U160" rev 0x01: irq 11
scsibus1 at ahc2: 16 targets
uhci0 at pci0 dev 29 function 0 "Intel 6300ESB USB" rev 0x02: irq 11
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1 "Intel 5300ESB USB" rev 0x02: irq 10
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
"Intel 6300ESB WDT" rev 0x02 at pci0 dev 29 function 4 not configured
"Intel 6300ESB APIC" rev 0x02 at pci0 dev 29 function 5 not configured
ehci0 at pci0 dev 29 function 7 "Intel 6300ESB USB" rev 0x02: irq 7
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1
usb2 at ehci0: USB revision 2.0
uhub2 at usb2
uhub2: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: single transaction translator
uhub2: 4 ports with 4 removable, self powered
ppb2 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0x0a
pci3 at ppb2 bus 3
em1 at pci3 dev 2 function 0 "Intel PRO/1000MT (82541EI)" rev 0x00:
irq 10, address: 00:12:3f:25:49:a7
vga1 at pci3 dev 14 function 0 "ATI Rage XL" rev 0x27
wsdisplay0 at vga1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ichpcib0 at pci0 dev 31 function 0 "Intel 6300ESB LPC" rev 0x02
pciide0 at pci0 dev 31 function 2 "Intel 6300ESB SATA" rev 0x02: DMA,
channel 0 configured to compatibility, channel 1 configured to
compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus2 at atapiscsi0: 2 targets
cd0 at scsibus2 targ 0 lun 0: <TEAC, CD-224E, K.9A> SCSI0 5/cdrom
removable
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 ignored (disabled)
"Intel 6300ESB SMBus" rev 0x02 at pci0 dev 31 function 3 not configured
isa0 at ichpcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 (mux 1 ignored for console): console keyboard, using
wsdisplay0
pmsi0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pmsi0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
sysbeep0 at pcppi0
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
biomask efe5 netmask efed ttymask ffef
pctr: user-level cycle counter enabled
Kernelized RAIDframe activated
ahc1: target 0 using 16bit transfers
ahc1: target 0 synchronous at 80.0MHz DT, offset = 0x3f
ahc1: target 1 using 16bit transfers
ahc1: target 1 synchronous at 80.0MHz DT, offset = 0x3f
cd0(atapiscsi0:0:0): Check Condition (error 0x70) on opcode 0x0
SENSE KEY: Not Ready
ASC/ASCQ: Medium Not Present
raid0 (root): (RAID Level 1) total number of sectors is 142185216
(69426 MB) as root
dkcsum: sd0 matched BIOS disk 80
dkcsum: sd1 matched BIOS disk 81
rootdev=0x1300 rrootdev=0x3600 rawdev=0x3602
raid0: Device already configured!
--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net