Over the past couple of months, I've been starting to wonder if the
Quality of FreeBSD's -STABLE branch has been deteriorating, to the point
that trusting it for any sort of "loaded server" environment is coming
into question ...

I have two servers sitting at Rackspace right now, both running Tyan L-ET
motherboards ... one with 1.2Ghz CPUs, and one with 1Ghz CPUs ... both
running 4Gig of RAM ... both running the AMI MegaRAID controllers ... one
is running a STABLE kernel from yesterday ... the other one can't get past
a STABLE kernel from Oct28th ... both servers are running *continuously*
>1300 processes, with >1500 more usual ...

The one server runs 87 jail'd environments (venus) ... the other, 116
(jupiter) ...  the machines are great performers *when they are running*,
but keeping them running has been a nightmare ...

The kernel configs for both machines are near identical (config attached),
with venus having an 'sa' device and 'sym' controller extra on it, for a
tape drive that is over there ... otherwise, they are identical ...

venus has a bit more drive space on it, which was a cause for problems for
awhile there until we upgraded its power supply to 400W instead of 300W,
which seemed to fix up alot of the problems ...

until last night, jupiter was running a kernel from Sept 10th, and getting
20 days uptime on her was more or less the norm ... 9hrs after upgrading
to last nights sources, she partially crashed ... pingable, but nothing
else ... she's back up now, and running smoothly ...

venus, if I try and boot onto a kernel post-Oct28th, hangs just after the
"SMP: AP CPU #1 Launched!" is printed ... I've tried to remove the sym/sa
devices from the kernel and rebuild, but same effect ... I had originally
thought it was the amr device changes on the 29th, but jupiter is running
the same controller, right down to the same firmware revision, and it
boots fine ... yet, I can't boot even onto an Oct29th kernel ... Oct28th
is the last day ...

Looking at the messages file for jupiter for this morning, there was
nothing to indicate a problem, yet it died sometime after 8am:

Nov 16 08:00:04 jupiter newsyslog[62442]: logfile turned over
Nov 16 10:48:45 jupiter /kernel: Copyright (c) 1992-2000 The FreeBSD Project.
Nov 16 10:48:45 jupiter /kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 
1991, 1992, 1993, 1994

Am I expecting too much from FreeBSD-STABLE?  Would I fair better if I
moved down into RELENG_4_7 and avoided -STABLE altogether?

Help?

Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 4.7-STABLE #19: Fri Nov 15 12:24:15 CST 2002
    [EMAIL PROTECTED]:/usr/obj/usr/src/sys/kernel
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (996.85-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x68a  Stepping = 10
  
Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE>

real memory  = 4227858432 (4128768K bytes)
avail memory = 4120489984 (4023916K bytes)
Programming 16 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  4, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  5, version: 0x000f0011, at 0xfec01000
Preloaded elf kernel "kernel" at 0xc029a000.
Preloaded elf module "netdump_client.ko" at 0xc029a09c.
link_elf: symbol fxp_intr undefined
Pentium Pro MTRR support enabled
Using $PIR table, 10 entries at 0xc00f51c0
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard
IOAPIC #1 intpin 6 -> irq 2
IOAPIC #1 intpin 4 -> irq 5
IOAPIC #1 intpin 5 -> irq 9
pci0: <PCI bus> on pcib0
pci0: <ATI Mach64-GR graphics accelerator> at 1.0 irq 2
pci0: <unknown card> (vendor=0x8086, dev=0x1229) at 4.0 irq 5
pci0: <unknown card> (vendor=0x8086, dev=0x1229) at 5.0 irq 9
isab0: <ServerWorks IB6566 PCI to ISA bridge> at device 15.0 on pci0
isa0: <ISA bus> on isab0
pci0: <Unknown PCI ATA controller> at 15.1
pci0: <OHCI USB controller> at 15.2 irq 10
pcib1: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard
IOAPIC #1 intpin 7 -> irq 11
pci1: <PCI bus> on pcib1
amr0: <AMI MegaRAID> mem 0xfc1f0000-0xfc1fffff irq 11 at device 3.0 on pci1
amr0: <Series 475 40 Logical Drive Firmware> Firmware E161, BIOS 3.13, 32MB RAM
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc9800-0xca7ff,0xca800-0xcb7ff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
IP packet filtering initialized, divert disabled, rule-based forwarding enabled, 
default to accept, logging disabled
amrd0: <MegaRAID logical drive> on amr0
amrd0: 8758MB (17936384 sectors) RAID 0 (optimal)
amrd1: <MegaRAID logical drive> on amr0
amrd1: 70036MB (143433728 sectors) RAID 5 (optimal)
Mounting root from ufs:/dev/amrd0s1a
SMP: AP CPU #1 Launched!
WARNING: / was not properly dismounted

machine         i386
cpu             I686_CPU
ident           kernel
maxusers        0

options         NMBCLUSTERS=15360

options         NSWAPDEV=2
makeoptions     DEBUG=-g                #Build kernel with gdb(1) debug symbols

options         INET                    #InterNETworking
options         FFS                     #Berkeley Fast Filesystem
options         FFS_ROOT                #FFS usable as root device [keep this!]
options         SOFTUPDATES             #Enable FFS soft updates support
options         UFS_DIRHASH             #Improve performance on big directories
options         COMPAT_43               #Compatible with BSD 4.3 [KEEP THIS!]
options         SCSI_DELAY=15000        #Delay (in ms) before probing SCSI
options         KTRACE                  #ktrace(1) support

options         SYSVSHM
options         SHMMAXPGS=199608
options         SHMMAX=(SHMMAXPGS*PAGE_SIZE+1)

options         SYSVSEM
options         SEMMNI=4096
options         SEMMNS=8192

options         SYSVMSG                 #SYSV-style message queues

options         IPFIREWALL                      #firewall
options         IPFIREWALL_FORWARD              #enable transparent proxy support
options         IPFIREWALL_DEFAULT_TO_ACCEPT    #allow everything by default

options         P1003_1B                #Posix P1003_1B real-time extensions
options         _KPOSIX_PRIORITY_SCHEDULING
options         ICMP_BANDLIM            #Rate limit bad replies

options         SMP                     # Symmetric MultiProcessor Kernel
options         APIC_IO                 # Symmetric (APIC) I/O

device          isa
device          pci

device          scbus           # SCSI bus (required)
device          da              # Direct Access (disks)

device          pass            # Passthrough device (direct SCSI access)

device          amr             # AMI MegaRAID

device          atkbdc0 at isa? port IO_KBD
device          atkbd0  at atkbdc? irq 1 flags 0x1
device          psm0    at atkbdc? irq 12

device          vga0    at isa?

pseudo-device   splash

device          sc0     at isa? flags 0x100

device          npx0    at nexus? port IO_NPX irq 13

pseudo-device   loop            # Network loopback
pseudo-device   ether           # Ethernet support
pseudo-device   pty     256     # Pseudo-ttys (telnet etc)

pseudo-device   bpf             #Berkeley packet filter

options    DDB
options    DDB_UNATTENDED

Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 4.7-STABLE #20: Sun Nov 10 18:55:29 CST 2002
    [EMAIL PROTECTED]:/usr/obj/usr/src/sys/kernel
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (1262.67-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x6b1  Stepping = 1
  
Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>

real memory  = 4227858432 (4128768K bytes)
avail memory = 4120436736 (4023864K bytes)
Programming 16 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  4, version: 0x000f0011, at 0xfec00000
 io1 (APIC): apic id:  5, version: 0x000f0011, at 0xfec01000
Preloaded elf kernel "kernel" at 0xc02a7000.
Pentium Pro MTRR support enabled
Using $PIR table, 10 entries at 0xc00f51c0
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard
IOAPIC #1 intpin 6 -> irq 2
IOAPIC #1 intpin 4 -> irq 5
IOAPIC #1 intpin 5 -> irq 9
pci0: <PCI bus> on pcib0
pci0: <ATI Mach64-GR graphics accelerator> at 1.0 irq 2
pci0: <unknown card> (vendor=0x8086, dev=0x1229) at 4.0 irq 5
pci0: <unknown card> (vendor=0x8086, dev=0x1229) at 5.0 irq 9
isab0: <ServerWorks IB6566 PCI to ISA bridge> at device 15.0 on pci0
isa0: <ISA bus> on isab0
pci0: <Unknown PCI ATA controller> at 15.1
pci0: <OHCI USB controller> at 15.2 irq 10
pcib1: <ServerWorks NB6635 3.0LE host to PCI bridge> on motherboard
IOAPIC #1 intpin 11 -> irq 11
IOAPIC #1 intpin 7 -> irq 16
IOAPIC #1 intpin 8 -> irq 17
pci1: <PCI bus> on pcib1
amr0: <AMI MegaRAID> mem 0xfc1f0000-0xfc1fffff irq 11 at device 2.0 on pci1
amr0: <Series 475 40 Logical Drive Firmware> Firmware E161, BIOS 3.13, 32MB RAM
sym0: <896> port 0xe400-0xe4ff mem 0xfebc8000-0xfebc9fff,0xfebe0000-0xfebe03ff irq 16 
at device 3.0 on pci1
sym0: Symbios NVRAM, ID 7, Fast-40, SE, parity checking
sym0: open drain IRQ line driver, using on-chip SRAM
sym0: using LOAD/STORE-based firmware.
sym0: handling phase mismatch from SCRIPTS.
sym1: <896> port 0xe800-0xe8ff mem 0xfebe8000-0xfebe9fff,0xfebf0000-0xfebf03ff irq 17 
at device 3.1 on pci1
sym1: Symbios NVRAM, ID 7, Fast-40, LVD, parity checking
sym1: open drain IRQ line driver, using on-chip SRAM
sym1: using LOAD/STORE-based firmware.
sym1: handling phase mismatch from SCRIPTS.
orm0: <Option ROMs> at iomem 
0xc0000-0xc7fff,0xc9800-0xca7ff,0xca800-0xcb7ff,0xcb800-0xcbfff on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
IP packet filtering initialized, divert disabled, rule-based forwarding enabled, 
default to accept, logging disabled
Waiting 15 seconds for SCSI devices to settle
(noperiph:sym0:0:-1:-1): SCSI BUS reset delivered.
(noperiph:sym1:0:-1:-1): SCSI BUS reset delivered.
amrd0: <MegaRAID logical drive> on amr0
amrd0: 105000MB (215040000 sectors) RAID 5 (optimal)
SMP: AP CPU #1 Launched!
sa0 at sym0 bus 0 target 0 lun 0
sa0: <SONY SDX-700C 0101> Removable Sequential Access SCSI-2 device 
sa0: 80.000MB/s transfers (40.000MHz, offset 31, 16bit)
Mounting root from ufs:/dev/amrd0s1a

Reply via email to