We have a Dell PowerEdge 750 running OpenBSD 3.7 that has been in production for the last ~14 months without issue. Suddenly, it has frozen each of the last two days. The only common denominator appears to have been that both events happened while a user was transferring a 100MB file via SFTP to their account on this server. Note that all the home directories on this server are NFS mounted from an Apple Xserve running OS X 10.3.9.

When the freeze occurs, the server can be pinged, but no services will respond. Even the console will not respond to input. There is no apparent kernel crash, and no way to force the ddb console. The only recourse is a hard restart, at which point it has to rebuild the RAIDframe parity on raid0. The only modification to the kernel was enabling RAIDframe support.

Which brings me to the possible cause. I didn't realize until these events that the swap partition (/dev/raid0b) was not being activated at boot. Based on some archive threads, it seems that OpenBSD will not utilize a raided swap partition (without jumping through some hoops). It seems possible that the memory might have gotten consumed, leading the system to hang while attempting to swap where there was none available. I'm not a memory expert, so I won't begin to theorize on whether this scenario is feasible or not. I've pulled the server out of production and replaced it with a new OpenBSD 3.9 server while I continue diagnosis offline. All attempts to reproduce this error condition, with swap disabled and enabled, have been unsuccessful.

Any ideas?  Dmesg below.

Thanks.

OpenBSD 3.7 (GENERIC) #0: Tue Jul 19 15:23:10 EDT 2005
    [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) 4 CPU 2.80GHz ("GenuineIntel" 686-class) 2.80 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36, CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,PNI,MWAIT,CNXT-ID
real mem  = 536195072 (523628K)
avail mem = 481669120 (470380K)
using 4278 buffers containing 26910720 bytes (26280K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(00) BIOS, date 02/16/05, BIOS32 rev. 0 @ 0xffe90
pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfc570/144 (7 entries)
pcibios0: no compatible PCI ICU found: ICU vendor 0x8086 product 0x25a1
pcibios0: Warning, unable to fix up PCI interrupt routing
pcibios0: PCI bus #3 is the last bus
bios0: ROM list: 0xc0000/0x8000 0xc8000/0x1000 0xc9000/0x5600 0xce800/0x1000 0xec000/0x4000!
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "Intel 82875P Host" rev 0x02
ppb0 at pci0 dev 3 function 0 "Intel 82875P PCI-CSA" rev 0x02
pci1 at ppb0 bus 1
em0 at pci1 dev 1 function 0 "Intel PRO/1000CT (82547EI)" rev 0x00: irq 3, address: 00:12:3f:25:49:a6
ppb1 at pci0 dev 28 function 0 "Intel 6300ESB PCIX" rev 0x02
pci2 at ppb1 bus 2
ahc1 at pci2 dev 1 function 0 "Adaptec AHA-3960D U160" rev 0x01: irq 11
scsibus0 at ahc1: 16 targets
sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST373207LW, D702> SCSI3 0/ direct fixed sd0: 70007MB, 90774 cyl, 2 head, 789 sec, 512 bytes/sec, 143374650 sec total sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST373207LW, D702> SCSI3 0/ direct fixed sd1: 70007MB, 90774 cyl, 2 head, 789 sec, 512 bytes/sec, 143374650 sec total
ahc2 at pci2 dev 1 function 1 "Adaptec AHA-3960D U160" rev 0x01: irq 11
scsibus1 at ahc2: 16 targets
uhci0 at pci0 dev 29 function 0 "Intel 6300ESB USB" rev 0x02: irq 11
usb0 at uhci0: USB revision 1.0
uhub0 at usb0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1 at pci0 dev 29 function 1 "Intel 5300ESB USB" rev 0x02: irq 10
usb1 at uhci1: USB revision 1.0
uhub1 at usb1
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
"Intel 6300ESB WDT" rev 0x02 at pci0 dev 29 function 4 not configured
"Intel 6300ESB APIC" rev 0x02 at pci0 dev 29 function 5 not configured
ehci0 at pci0 dev 29 function 7 "Intel 6300ESB USB" rev 0x02: irq 7
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1
usb2 at ehci0: USB revision 2.0
uhub2 at usb2
uhub2: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: single transaction translator
uhub2: 4 ports with 4 removable, self powered
ppb2 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0x0a
pci3 at ppb2 bus 3
em1 at pci3 dev 2 function 0 "Intel PRO/1000MT (82541EI)" rev 0x00: irq 10, address: 00:12:3f:25:49:a7
vga1 at pci3 dev 14 function 0 "ATI Rage XL" rev 0x27
wsdisplay0 at vga1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ichpcib0 at pci0 dev 31 function 0 "Intel 6300ESB LPC" rev 0x02
pciide0 at pci0 dev 31 function 2 "Intel 6300ESB SATA" rev 0x02: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility
atapiscsi0 at pciide0 channel 0 drive 0
scsibus2 at atapiscsi0: 2 targets
cd0 at scsibus2 targ 0 lun 0: <TEAC, CD-224E, K.9A> SCSI0 5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 ignored (disabled)
"Intel 6300ESB SMBus" rev 0x02 at pci0 dev 31 function 3 not configured
isa0 at ichpcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 (mux 1 ignored for console): console keyboard, using wsdisplay0
pmsi0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pmsi0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
sysbeep0 at pcppi0
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
biomask efe5 netmask efed ttymask ffef
pctr: user-level cycle counter enabled
Kernelized RAIDframe activated
ahc1: target 0 using 16bit transfers
ahc1: target 0 synchronous at 80.0MHz DT, offset = 0x3f
ahc1: target 1 using 16bit transfers
ahc1: target 1 synchronous at 80.0MHz DT, offset = 0x3f
cd0(atapiscsi0:0:0): Check Condition (error 0x70) on opcode 0x0
    SENSE KEY: Not Ready
     ASC/ASCQ: Medium Not Present
raid0 (root): (RAID Level 1) total number of sectors is 142185216 (69426 MB) as root
dkcsum: sd0 matched BIOS disk 80
dkcsum: sd1 matched BIOS disk 81
rootdev=0x1300 rrootdev=0x3600 rawdev=0x3602
raid0: Device already configured!

--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net

Reply via email to