Over the past couple of months, I've been starting to wonder if the Quality of FreeBSD's -STABLE branch has been deteriorating, to the point that trusting it for any sort of "loaded server" environment is coming into question ...
I have two servers sitting at Rackspace right now, both running Tyan L-ET motherboards ... one with 1.2Ghz CPUs, and one with 1Ghz CPUs ... both running 4Gig of RAM ... both running the AMI MegaRAID controllers ... one is running a STABLE kernel from yesterday ... the other one can't get past a STABLE kernel from Oct28th ... both servers are running *continuously* >1300 processes, with >1500 more usual ... The one server runs 87 jail'd environments (venus) ... the other, 116 (jupiter) ... the machines are great performers *when they are running*, but keeping them running has been a nightmare ... The kernel configs for both machines are near identical (config attached), with venus having an 'sa' device and 'sym' controller extra on it, for a tape drive that is over there ... otherwise, they are identical ... venus has a bit more drive space on it, which was a cause for problems for awhile there until we upgraded its power supply to 400W instead of 300W, which seemed to fix up alot of the problems ... until last night, jupiter was running a kernel from Sept 10th, and getting 20 days uptime on her was more or less the norm ... 9hrs after upgrading to last nights sources, she partially crashed ... pingable, but nothing else ... she's back up now, and running smoothly ... venus, if I try and boot onto a kernel post-Oct28th, hangs just after the "SMP: AP CPU #1 Launched!" is printed ... I've tried to remove the sym/sa devices from the kernel and rebuild, but same effect ... I had originally thought it was the amr device changes on the 29th, but jupiter is running the same controller, right down to the same firmware revision, and it boots fine ... yet, I can't boot even onto an Oct29th kernel ... Oct28th is the last day ... Looking at the messages file for jupiter for this morning, there was nothing to indicate a problem, yet it died sometime after 8am: Nov 16 08:00:04 jupiter newsyslog[62442]: logfile turned over Nov 16 10:48:45 jupiter /kernel: Copyright (c) 1992-2000 The FreeBSD Project. Nov 16 10:48:45 jupiter /kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Am I expecting too much from FreeBSD-STABLE? Would I fair better if I moved down into RELENG_4_7 and avoided -STABLE altogether? Help?
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.7-STABLE #19: Fri Nov 15 12:24:15 CST 2002 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/kernel Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (996.85-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x68a Stepping = 10 Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE> real memory = 4227858432 (4128768K bytes) avail memory = 4120489984 (4023916K bytes) Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 4, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 5, version: 0x000f0011, at 0xfec01000 Preloaded elf kernel "kernel" at 0xc029a000. Preloaded elf module "netdump_client.ko" at 0xc029a09c. link_elf: symbol fxp_intr undefined Pentium Pro MTRR support enabled Using $PIR table, 10 entries at 0xc00f51c0 npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard IOAPIC #1 intpin 6 -> irq 2 IOAPIC #1 intpin 4 -> irq 5 IOAPIC #1 intpin 5 -> irq 9 pci0: <PCI bus> on pcib0 pci0: <ATI Mach64-GR graphics accelerator> at 1.0 irq 2 pci0: <unknown card> (vendor=0x8086, dev=0x1229) at 4.0 irq 5 pci0: <unknown card> (vendor=0x8086, dev=0x1229) at 5.0 irq 9 isab0: <ServerWorks IB6566 PCI to ISA bridge> at device 15.0 on pci0 isa0: <ISA bus> on isab0 pci0: <Unknown PCI ATA controller> at 15.1 pci0: <OHCI USB controller> at 15.2 irq 10 pcib1: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard IOAPIC #1 intpin 7 -> irq 11 pci1: <PCI bus> on pcib1 amr0: <AMI MegaRAID> mem 0xfc1f0000-0xfc1fffff irq 11 at device 3.0 on pci1 amr0: <Series 475 40 Logical Drive Firmware> Firmware E161, BIOS 3.13, 32MB RAM orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc9800-0xca7ff,0xca800-0xcb7ff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> APIC_IO: Testing 8254 interrupt delivery APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2 APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0 IP packet filtering initialized, divert disabled, rule-based forwarding enabled, default to accept, logging disabled amrd0: <MegaRAID logical drive> on amr0 amrd0: 8758MB (17936384 sectors) RAID 0 (optimal) amrd1: <MegaRAID logical drive> on amr0 amrd1: 70036MB (143433728 sectors) RAID 5 (optimal) Mounting root from ufs:/dev/amrd0s1a SMP: AP CPU #1 Launched! WARNING: / was not properly dismounted
machine i386 cpu I686_CPU ident kernel maxusers 0 options NMBCLUSTERS=15360 options NSWAPDEV=2 makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols options INET #InterNETworking options FFS #Berkeley Fast Filesystem options FFS_ROOT #FFS usable as root device [keep this!] options SOFTUPDATES #Enable FFS soft updates support options UFS_DIRHASH #Improve performance on big directories options COMPAT_43 #Compatible with BSD 4.3 [KEEP THIS!] options SCSI_DELAY=15000 #Delay (in ms) before probing SCSI options KTRACE #ktrace(1) support options SYSVSHM options SHMMAXPGS=199608 options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1) options SYSVSEM options SEMMNI=4096 options SEMMNS=8192 options SYSVMSG #SYSV-style message queues options IPFIREWALL #firewall options IPFIREWALL_FORWARD #enable transparent proxy support options IPFIREWALL_DEFAULT_TO_ACCEPT #allow everything by default options P1003_1B #Posix P1003_1B real-time extensions options _KPOSIX_PRIORITY_SCHEDULING options ICMP_BANDLIM #Rate limit bad replies options SMP # Symmetric MultiProcessor Kernel options APIC_IO # Symmetric (APIC) I/O device isa device pci device scbus # SCSI bus (required) device da # Direct Access (disks) device pass # Passthrough device (direct SCSI access) device amr # AMI MegaRAID device atkbdc0 at isa? port IO_KBD device atkbd0 at atkbdc? irq 1 flags 0x1 device psm0 at atkbdc? irq 12 device vga0 at isa? pseudo-device splash device sc0 at isa? flags 0x100 device npx0 at nexus? port IO_NPX irq 13 pseudo-device loop # Network loopback pseudo-device ether # Ethernet support pseudo-device pty 256 # Pseudo-ttys (telnet etc) pseudo-device bpf #Berkeley packet filter options DDB options DDB_UNATTENDED
Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.7-STABLE #20: Sun Nov 10 18:55:29 CST 2002 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/kernel Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (1262.67-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x6b1 Stepping = 1 Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> real memory = 4227858432 (4128768K bytes) avail memory = 4120436736 (4023864K bytes) Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 4, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 5, version: 0x000f0011, at 0xfec01000 Preloaded elf kernel "kernel" at 0xc02a7000. Pentium Pro MTRR support enabled Using $PIR table, 10 entries at 0xc00f51c0 npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard IOAPIC #1 intpin 6 -> irq 2 IOAPIC #1 intpin 4 -> irq 5 IOAPIC #1 intpin 5 -> irq 9 pci0: <PCI bus> on pcib0 pci0: <ATI Mach64-GR graphics accelerator> at 1.0 irq 2 pci0: <unknown card> (vendor=0x8086, dev=0x1229) at 4.0 irq 5 pci0: <unknown card> (vendor=0x8086, dev=0x1229) at 5.0 irq 9 isab0: <ServerWorks IB6566 PCI to ISA bridge> at device 15.0 on pci0 isa0: <ISA bus> on isab0 pci0: <Unknown PCI ATA controller> at 15.1 pci0: <OHCI USB controller> at 15.2 irq 10 pcib1: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard IOAPIC #1 intpin 11 -> irq 11 IOAPIC #1 intpin 7 -> irq 16 IOAPIC #1 intpin 8 -> irq 17 pci1: <PCI bus> on pcib1 amr0: <AMI MegaRAID> mem 0xfc1f0000-0xfc1fffff irq 11 at device 2.0 on pci1 amr0: <Series 475 40 Logical Drive Firmware> Firmware E161, BIOS 3.13, 32MB RAM sym0: <896> port 0xe400-0xe4ff mem 0xfebc8000-0xfebc9fff,0xfebe0000-0xfebe03ff irq 16 at device 3.0 on pci1 sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking sym0: open drain IRQ line driver, using on-chip SRAM sym0: using LOAD/STORE-based firmware. sym0: handling phase mismatch from SCRIPTS. sym1: <896> port 0xe800-0xe8ff mem 0xfebe8000-0xfebe9fff,0xfebf0000-0xfebf03ff irq 17 at device 3.1 on pci1 sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking sym1: open drain IRQ line driver, using on-chip SRAM sym1: using LOAD/STORE-based firmware. sym1: handling phase mismatch from SCRIPTS. orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc9800-0xca7ff,0xca800-0xcb7ff,0xcb800-0xcbfff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> APIC_IO: Testing 8254 interrupt delivery APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2 APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0 IP packet filtering initialized, divert disabled, rule-based forwarding enabled, default to accept, logging disabled Waiting 15 seconds for SCSI devices to settle (noperiph:sym0:0:-1:-1): SCSI BUS reset delivered. (noperiph:sym1:0:-1:-1): SCSI BUS reset delivered. amrd0: <MegaRAID logical drive> on amr0 amrd0: 105000MB (215040000 sectors) RAID 5 (optimal) SMP: AP CPU #1 Launched! sa0 at sym0 bus 0 target 0 lun 0 sa0: <SONY SDX-700C 0101> Removable Sequential Access SCSI-2 device sa0: 80.000MB/s transfers (40.000MHz, offset 31, 16bit) Mounting root from ufs:/dev/amrd0s1a