And it doesn't dump its core to its dump swap space, too, so I can't run savecore after reboot to get debugging info. I have the swap space in fstab commented out so it won't come up at boot to be able to manually harvest the core, as it gives "savecore: no dumps found." (it doesn't happen automatically, either).
We recently thought we'd give 5.3 a go in production, and it has been too unstable. When it crashes, it doesn't reboot, so it just hangs there until someone has to drive in and push the button. Who knows, maybe Linux would be more stable at this point. Sigh. Hardware that it is running on is a Tyan s2875 with dual amd64/246 processors, and 2 GB Registered DDR RAM (Corsair). We're also running vinum for all of the filesystems, mirroring them all, including the root filesystem. The vinum is using two SATA WD Raptors. I have one older IDE drive plugged in to capture the kernel dumps. We've tried many different memory configurations to see if we can tune it so that FreeBSD can handle it (DRAM ECC vs master ECC, bank & node interleaving turned off/on, slowing the memory down, DRAM Scrub Redirect off/on, etc, to no avail. It's usually pagedaemon that croaks, but it crashes on the keyboard irq process and serial IO irq process for some reason also. I guess since it's usually the pager that dies, that's the reason why I can't get kernel dumps. Here are some (manually copied) panics from the console. Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x88 fault code = supervisor read, page not present instruction pointer = 0x8:0xffffffff80389aea stack pointer = 0x10:0xffffffffb2051a60 frame pointer = 0x10:0xffffff006b12d000 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 53 (pagedaemon) trap number = 12 panic: page fault cpuid = 0 boot() called on cpu#0 Uptime: 10h18m49s ... Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x88 fault code = supervisor read, page not present instruction pointer = 0x8:0xffffffff8038a10a frame pointer = 0x10:0xffffffffb2051ab0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 53 (pagedaemon) trap number = 12 panic: page fault cpuid = 0 boot() called on cpu#0 Uptime: 15h59m55s ... = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resumek IOPL = 0 current process = 36 (swi5: clock sio) trap number = 12 panic: page fault cpuid = 1 kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x48 fault code = supervisor read, page not present instruction pointer = 0x8: 0xffffffff803a40d3 stack pointer = 0x10: 0xffffffffb1d63650 frame pointer = 0x10: 0xffffff007b7f3a40 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0,pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 30 trap number = 12 panic: page fault cpuid = 1 spin lock sched lock held by 0xffffff007b8177b0 for > 5 seconds ... What can I do to debug this more if I can't harvest the kernel dumps to report a bug? Is there anything the FreeBSD team can do? Do I need to resort to Linux for dual amd64 support for now? <cringe> Thanks, ../troy
smime.p7s
Description: S/MIME cryptographic signature