Mike H wrote: > Hi All, > > I'm having a problem with ospfd on a 4.3 system (dmesg below) and I'm > hoping someone here can suggest something to help me resolve it. > > The problem is that occasionally the system loses all routes learned > via OSPF ('netstat -rn' and 'ospfctl show fib' continue to show > connected and static routes). I have not been able to force the > problem to occur, nor have I been able to correlate it to any other > events. I am able to restore routing by restarting ospfd. > > When the problem happens, I see these entries in the messages and > daemon logs: > > > $ grep ospfd /var/log/messages > > Aug 8 15:30:08 psc-wifigw1 ospfd[20751]: recv_db_description: seq num > mismatch, bad flags > Aug 8 16:29:35 psc-wifigw1 ospfd[20751]: recv_db_description: invalid > seq num, mine d5e04e66 his d5e04e65 > Aug 8 16:29:35 psc-wifigw1 ospfd[20751]: nbr_fsm: neighbor ID > 10.222.16.65, event SEQ_NUM_MISMATCH not expected in state EXSTA > Aug 8 16:29:39 psc-wifigw1 ospfd[20751]: recv_db_description: seq num > mismatch, bad flags > Aug 8 16:40:08 psc-wifigw1 ospfd[20751]: recv_db_description: seq num > mismatch, bad flags > Aug 8 16:49:44 psc-wifigw1 ospfd[20751]: recv_db_description: seq num > mismatch, bad flags > Aug 8 16:50:42 psc-wifigw1 ospfd[20751]: nbr_adj_timer: failed to > form adjacency with 10.222.16.33 > Aug 8 16:51:42 psc-wifigw1 ospfd[20751]: nbr_adj_timer: failed to > form adjacency with 10.222.16.33 > Aug 8 17:00:42 psc-wifigw1 ospfd[20751]: nbr_adj_timer: failed to > form adjacency with 10.222.16.33 > Aug 8 17:01:42 psc-wifigw1 ospfd[20751]: nbr_adj_timer: failed to > form adjacency with 10.222.16.33 > Aug 8 18:49:49 psc-wifigw1 ospfd[28802]: lost child: route decision > engine exited > Aug 8 18:49:49 psc-wifigw1 ospfd[20751]: if_leave_group: error > IP_DROP_MEMBERSHIP, interface em0 address 224.0.0.6: Can't assign > requested address > Aug 8 19:43:18 psc-wifigw1 ospfd[9675]: nbr_fsm: neighbor ID > 10.222.16.33, event LOADING_DONE not expected in state EXCHG > Aug 8 19:43:22 psc-wifigw1 ospfd[9675]: recv_db_description: seq num > mismatch, bad flags > > The "invalid seq num, mine... his..." always seems to show a > difference of one. > > > Here is my ospfd.conf: > > $ sudo ospfd -nvf /etc/ospfd.conf > > router-id 10.223.32.130 > fib-update yes > rfc1583compat no > stub router yes > redistribute connected > spf-delay 1 > spf-holdtime 5 > > area 0.0.0.50 { > interface em1:10.223.32.1 { > hello-interval 10 > metric 10 > retransmit-interval 5 > router-dead-time 40 > router-priority 1 > transmit-delay 1 > auth-type crypt > auth-md-keyid 2 > auth-md 2 XXXXXX > } > } > > area 0.0.0.0 { > interface em0:10.222.0.5 { > hello-interval 10 > metric 10 > retransmit-interval 5 > router-dead-time 40 > router-priority 1 > transmit-delay 1 > auth-type none > } > } > > > Here is my dmesg output: > > $ dmesg > > OpenBSD 4.3 (GENERIC.MP) #1582: Wed Mar 12 11:16:45 MDT 2008 > [EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 2145873920 (2046MB) > avail mem = 2072150016 (1976MB) > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf0000 (41 entries) > bios0: vendor Sun Microsystems version "1.0.9" date 03/20/2006 > bios0: Sun Microsystems Sun Fire(TM) X2100 > acpi0 at bios0: rev 0 > acpi0: tables DSDT FACP SSDT SRAT MCFG APIC > acpi0: wakeup devices HUB0(S5) XVR0(S5) XVR1(S5) XVR2(S5) XVR3(S5) > USB0(S3) USB2(S3) MMAC(S5) MMCI(S5) UAR1(S5) > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Dual Core AMD Opteron(tm) Processor 175, 2211.58 MHz > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW > > > cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB > 64b/line 16-way L2 cache > cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully > associative > cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully > associative > cpu0: apic clock running at 201MHz > cpu1 at mainbus0: apid 1 (application processor) > cpu1: Dual Core AMD Opteron(tm) Processor 175, 2211.33 MHz > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW > > > cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB > 64b/line 16-way L2 cache > cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully > associative > cpu1: DTLB 32 4KB entries fully associative, 8 4MB entries fully > associative > ioapic0 at mainbus0 apid 2 pa 0xfec00000, version 11, 24 pins > ioapic0: misconfigured as apic 0, remapped to apid 2 > acpiprt0 at acpi0: bus 0 (PCI0) > acpiprt1 at acpi0: bus 1 (HUB0) > acpicpu0 at acpi0: PSS > acpicpu1 at acpi0: PSS > acpibtn0 at acpi0: PWRB > ipmi at mainbus0 not configured > cpu0: Cool'n'Quiet K8 2211 MHz: speeds: 2200 2000 1800 1000 MHz > pci0 at mainbus0 bus 0: configuration mode 1 > "NVIDIA nForce4 DDR" rev 0xa3 at pci0 dev 0 function 0 not configured > pcib0 at pci0 dev 1 function 0 "NVIDIA nForce4 ISA" rev 0xa3 > nviic0 at pci0 dev 1 function 1 "NVIDIA nForce4 SMBus" rev 0xa2 > iic0 at nviic0 > adt0 at iic0 addr 0x2e: sch5017 rev 0x89 > spdmem0 at iic0 addr 0x50: 1GB DDR SDRAM ECC PC3200CL3.0 > spdmem1 at iic0 addr 0x51: 1GB DDR SDRAM ECC PC3200CL3.0 > iic1 at nviic0 > adt1 at iic1 addr 0x2e: sch5017 rev 0x89 > spdmem2 at iic1 addr 0x50: 1GB DDR SDRAM ECC PC3200CL3.0 > spdmem3 at iic1 addr 0x51: 1GB DDR SDRAM ECC PC3200CL3.0 > ohci0 at pci0 dev 2 function 0 "NVIDIA nForce4 USB" rev 0xa2: apic 2 > int 0 (irq 5), version 1.0, legacy support > ehci0 at pci0 dev 2 function 1 "NVIDIA nForce4 USB" rev 0xa3: apic 2 > int 0 (irq 10) > usb0 at ehci0: USB revision 2.0 > uhub0 at usb0 "NVIDIA EHCI root hub" rev 2.00/1.00 addr 1 > pciide0 at pci0 dev 6 function 0 "NVIDIA nForce4 IDE" rev 0xf2: DMA, > channel 0 configured to compatibility, channel 1 configured to > compatibility > atapiscsi0 at pciide0 channel 0 drive 0 > scsibus0 at atapiscsi0: 2 targets > cd0 at scsibus0 targ 0 lun 0: <MATSHITA, DVD-ROM SR-8178, PZ16> SCSI0 > 5/cdrom removable > cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4 > pciide0: channel 1 disabled (no drives) > pciide1 at pci0 dev 7 function 0 "NVIDIA nForce4 SATA" rev 0xf3: DMA > pciide1: using apic 2 int 0 (irq 11) for native-PCI interrupt > wd0 at pciide1 channel 0 drive 0: <ST3250823AS> > wd0: 16-sector PIO, LBA48, 238475MB, 488397168 sectors > wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5 > pciide2 at pci0 dev 8 function 0 "NVIDIA nForce4 SATA" rev 0xf3: DMA > pciide2: using apic 2 int 0 (irq 10) for native-PCI interrupt > ppb0 at pci0 dev 9 function 0 "NVIDIA nForce4 PCI-PCI" rev 0xa2 > pci1 at ppb0 bus 1 > vga1 at pci1 dev 5 function 0 "ATI Rage XL" rev 0x27 > wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) > wsdisplay0: screen 1-5 added (80x25, vt100 emulation) > nfe0 at pci0 dev 10 function 0 "NVIDIA CK804 LAN" rev 0xa3: apic 2 int > 0 (irq 11), address 00:e0:81:5d:bd:e3 > eephy0 at nfe0 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2 > ppb1 at pci0 dev 11 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci2 at ppb1 bus 2 > ppb2 at pci0 dev 12 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci3 at ppb2 bus 3 > ppb3 at pci0 dev 13 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci4 at ppb3 bus 4 > bge0 at pci4 dev 0 function 0 "Broadcom BCM5721" rev 0x11, BCM5750 B1 > (0x4101): apic 2 int 5 (irq 5), address 00:e0:81:5d:bd:e4 > brgphy0 at bge0 phy 1: BCM5750 10/100/1000baseT PHY, rev. 0 > ppb4 at pci0 dev 14 function 0 "NVIDIA nForce4 PCIE" rev 0xa3 > pci5 at ppb4 bus 5 > em0 at pci5 dev 0 function 0 "Intel PRO/1000 PT (82571EB)" rev 0x06: > apic 2 int 7 (irq 7), address 00:15:17:0d:07:b0 > em1 at pci5 dev 0 function 1 "Intel PRO/1000 PT (82571EB)" rev 0x06: > apic 2 int 5 (irq 5), address 00:15:17:0d:07:b1 > pchb0 at pci0 dev 24 function 0 "AMD AMD64 HyperTransport" rev 0x00 > pchb1 at pci0 dev 24 function 1 "AMD AMD64 Address Map" rev 0x00 > pchb2 at pci0 dev 24 function 2 "AMD AMD64 DRAM Cfg" rev 0x00 > pchb3 at pci0 dev 24 function 3 "AMD AMD64 Misc Cfg" rev 0x00 > isa0 at pcib0 > isadma0 at isa0 > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > pckbc0 at isa0 port 0x60/5 > pckbd0 at pckbc0 (kbd slot) > pckbc0: using irq 1 for kbd slot > wskbd0 at pckbd0: console keyboard, using wsdisplay0 > pcppi0 at isa0 port 0x61 > midi0 at pcppi0: <PC speaker> > spkr0 at pcppi0 > usb1 at ohci0: USB revision 1.0 > uhub1 at usb1 "NVIDIA OHCI root hub" rev 1.00/1.00 addr 1 > uhidev0 at uhub0 port 1 configuration 1 interface 0 "Avocent Virtual > Media KVM Module" rev 2.00/1.00 addr 2 > uhidev0: iclass 3/1 > ukbd0 at uhidev0: 8 modifier keys, 6 key codes, country code 33 > wskbd1 at ukbd0 mux 1 > wskbd1: connecting to wsdisplay0 > uhidev1 at uhub0 port 1 configuration 1 interface 1 "Avocent Virtual > Media KVM Module" rev 2.00/1.00 addr 2 > uhidev1: iclass 3/1, 3 report ids > ums0 at uhidev1 reportid 1: 5 buttons and Z dir. > wsmouse0 at ums0 mux 0 > uhid0 at uhidev1 reportid 2: input=2, output=0, feature=0 > uhid1 at uhidev1 reportid 3: input=1, output=0, feature=0 > umass0 at uhub0 port 1 configuration 1 interface 2 "Avocent Virtual > Media KVM Module" rev 2.00/1.00 addr 2 > umass0: using SCSI over Bulk-Only > scsibus1 at umass0: 2 targets > cd1 at scsibus1 targ 1 lun 0: <KVM, vmDisk-CD, 0.01> SCSI0 5/cdrom > removable > softraid0 at root > root on wd0a swap on wd0b dump on wd0b > > > > I appreciate any input you might have on how to further troubleshoot > this problem! > > > Thanks, > > -Mike > Hi Mike,
Did you ever resolve this problem? I'm getting very similar symptoms with an openBSD 4.2 setup. I occasionally get the following: *firewall1* Aug 22 10:32:36 firewall1 ospfd[15383]: nbr_fsm: neighbor ID 10.0.0.156, event LOADING_DONE not expected in state EXCHG Aug 22 10:32:36 firewall1 ospfd[15383]: nbr_fsm: neighbor ID 10.0.0.157, event LOADING_DONE not expected in state EXCHG Aug 22 10:33:36 firewall1 ospfd[15383]: nbr_adj_timer: failed to form adjacency with 10.0.0.159 Aug 22 10:34:36 firewall1 ospfd[15383]: nbr_adj_timer: failed to form adjacency with 10.0.0.159 Aug 22 10:36:36 firewall1 last message repeated 2 times Aug 22 10:40:36 firewall1 last message repeated 4 times Aug 22 10:40:41 firewall1 ospfd[16136]: nbr_fsm: neighbor ID 10.0.0.157, event BAD_LS_REQ not expected in state EXSTA Aug 22 10:40:41 firewall1 ospfd[16136]: nbr_fsm: neighbor ID 10.0.0.156, event LOADING_DONE not expected in state EXCHG Aug 22 10:55:07 firewall1 ospfd[16136]: nbr_adj_timer: failed to form adjacency with 10.0.0.159 Aug 22 10:56:07 firewall1 ospfd[16136]: nbr_adj_timer: failed to form adjacency with 10.0.0.159 Aug 22 10:58:07 firewall1 last message repeated 2 times Aug 22 11:08:07 firewall1 last message repeated 10 times Aug 22 11:18:07 firewall1 last message repeated 10 times Aug 22 11:28:07 firewall1 last message repeated 10 times Aug 22 11:38:07 firewall1 last message repeated 10 times *firewall2* Aug 22 10:32:24 firewall2 ospfd[12791]: recv_db_description: seq num mismatch, bad flags OSPF Neighbour table: [EMAIL PROTECTED](10.0.0.158):/root>ospfctl show neigh ID Pri State DeadTime Address Iface Uptime 10.0.0.159 1 2-WAY/OTHER 00:00:01 10.0.0.214 em7 - 10.0.0.249 128 FULL/DR 00:00:01 10.0.0.210 em7 00:30:49 10.0.0.248 128 FULL/BCKUP 00:00:01 10.0.0.209 em7 00:30:47 *10.0.0.159 1 EXSTA/BCKUP 00:00:01 10.0.0.44 em0 - * 10.0.0.156 128 2-WAY/OTHER 00:00:01 10.0.0.41 em0 - 10.0.0.157 128 FULL/DR 00:00:01 10.0.0.42 em0 00:17:25 [EMAIL PROTECTED](10.0.0.159):/root>ospfctl show neigh ID Pri State DeadTime Address Iface Uptime 10.0.0.158 1 2-WAY/OTHER 00:00:01 10.0.0.213 em7 - 10.0.0.248 128 FULL/BCKUP 00:00:01 10.0.0.209 em7 00:39:36 10.0.0.249 128 FULL/DR 00:00:01 10.0.0.210 em7 01:41:29 10.0.0.158 1 2-WAY/OTHER 00:00:01 10.0.0.43 em0 - 10.0.0.157 128 FULL/DR 00:00:01 10.0.0.42 em0 00:39:36 10.0.0.156 128 FULL/BCKUP 00:00:01 10.0.0.41 em0 07:48:31 Then one of the firewalls changes the openbsd peer on one interface to state EXSTA/BCKUP and never recovers. Other strange things: previously I was running this on an ESX server as VMs and I was getting this, which would result in all the routes being lost as you described (as a test - see post: "ospf unexpectedly changing to EXSTA"). However, now with physical machines, I get the same log error messages and it changes to this state, though no routes are lost. I can't see what is triggering it, it seems entirely random at the moment. Any ideas, or any pointers on what I can do to debug this? Cheers, Cliff. ospf config for firewall 1: router-id 10.0.0.158 hello-interval 1 metric 10 retransmit-interval 5 router-dead-time 2 # areas area 0.0.0.1 { interface em0 { } interface lo1 { passive } interface em7 { } } Firewall 2 is the same but with metric 100 instead.