Mike H wrote:
> Hi All,
>
> I'm having a problem with ospfd on a 4.3 system (dmesg below) and I'm 
> hoping someone here can suggest something to help me resolve it.
>
> The problem is that occasionally the system loses all routes learned 
> via OSPF ('netstat -rn' and 'ospfctl show fib' continue to show 
> connected and static routes).  I have not been able to force the 
> problem to occur, nor have I been able to correlate it to any other 
> events.  I am able to restore routing by restarting ospfd.
>
> When the problem happens, I see these entries in the messages and 
> daemon logs:
>
>
> $ grep ospfd /var/log/messages
>
> Aug  8 15:30:08 psc-wifigw1 ospfd[20751]: recv_db_description: seq num 
> mismatch, bad flags
> Aug  8 16:29:35 psc-wifigw1 ospfd[20751]: recv_db_description: invalid 
> seq num, mine d5e04e66 his d5e04e65
> Aug  8 16:29:35 psc-wifigw1 ospfd[20751]: nbr_fsm: neighbor ID 
> 10.222.16.65, event SEQ_NUM_MISMATCH not expected in state EXSTA
> Aug  8 16:29:39 psc-wifigw1 ospfd[20751]: recv_db_description: seq num 
> mismatch, bad flags
> Aug  8 16:40:08 psc-wifigw1 ospfd[20751]: recv_db_description: seq num 
> mismatch, bad flags
> Aug  8 16:49:44 psc-wifigw1 ospfd[20751]: recv_db_description: seq num 
> mismatch, bad flags
> Aug  8 16:50:42 psc-wifigw1 ospfd[20751]: nbr_adj_timer: failed to 
> form adjacency with 10.222.16.33
> Aug  8 16:51:42 psc-wifigw1 ospfd[20751]: nbr_adj_timer: failed to 
> form adjacency with 10.222.16.33
> Aug  8 17:00:42 psc-wifigw1 ospfd[20751]: nbr_adj_timer: failed to 
> form adjacency with 10.222.16.33
> Aug  8 17:01:42 psc-wifigw1 ospfd[20751]: nbr_adj_timer: failed to 
> form adjacency with 10.222.16.33
> Aug  8 18:49:49 psc-wifigw1 ospfd[28802]: lost child: route decision 
> engine exited
> Aug  8 18:49:49 psc-wifigw1 ospfd[20751]: if_leave_group: error 
> IP_DROP_MEMBERSHIP, interface em0 address 224.0.0.6: Can't assign 
> requested address
> Aug  8 19:43:18 psc-wifigw1 ospfd[9675]: nbr_fsm: neighbor ID 
> 10.222.16.33, event LOADING_DONE not expected in state EXCHG
> Aug  8 19:43:22 psc-wifigw1 ospfd[9675]: recv_db_description: seq num 
> mismatch, bad flags
>
> The "invalid seq num, mine... his..." always seems to show a 
> difference of one.
>
>
> Here is my ospfd.conf:
>
> $ sudo ospfd -nvf /etc/ospfd.conf
>
> router-id 10.223.32.130
> fib-update yes
> rfc1583compat no
> stub router yes
> redistribute connected
> spf-delay 1
> spf-holdtime 5
>
> area 0.0.0.50 {
>         interface em1:10.223.32.1 {
>                 hello-interval 10
>                 metric 10
>                 retransmit-interval 5
>                 router-dead-time 40
>                 router-priority 1
>                 transmit-delay 1
>                 auth-type crypt
>                 auth-md-keyid 2
>                 auth-md 2 XXXXXX
>         }
> }
>
> area 0.0.0.0 {
>         interface em0:10.222.0.5 {
>                 hello-interval 10
>                 metric 10
>                 retransmit-interval 5
>                 router-dead-time 40
>                 router-priority 1
>                 transmit-delay 1
>                 auth-type none
>         }
> }
>
>
> Here is my dmesg output:
>
> $ dmesg
>
> OpenBSD 4.3 (GENERIC.MP) #1582: Wed Mar 12 11:16:45 MDT 2008
>     [EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 2145873920 (2046MB)
> avail mem = 2072150016 (1976MB)
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf0000 (41 entries)
> bios0: vendor Sun Microsystems version "1.0.9" date 03/20/2006
> bios0: Sun Microsystems Sun Fire(TM) X2100
> acpi0 at bios0: rev 0
> acpi0: tables DSDT FACP SSDT SRAT MCFG APIC
> acpi0: wakeup devices HUB0(S5) XVR0(S5) XVR1(S5) XVR2(S5) XVR3(S5) 
> USB0(S3) USB2(S3) MMAC(S5) MMCI(S5) UAR1(S5)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Dual Core AMD Opteron(tm) Processor 175, 2211.58 MHz
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
>  
>
> cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
> 64b/line 16-way L2 cache
> cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully 
> associative
> cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully 
> associative
> cpu0: apic clock running at 201MHz
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Dual Core AMD Opteron(tm) Processor 175, 2211.33 MHz
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
>  
>
> cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 
> 64b/line 16-way L2 cache
> cpu1: ITLB 32 4KB entries fully associative, 8 4MB entries fully 
> associative
> cpu1: DTLB 32 4KB entries fully associative, 8 4MB entries fully 
> associative
> ioapic0 at mainbus0 apid 2 pa 0xfec00000, version 11, 24 pins
> ioapic0: misconfigured as apic 0, remapped to apid 2
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 1 (HUB0)
> acpicpu0 at acpi0: PSS
> acpicpu1 at acpi0: PSS
> acpibtn0 at acpi0: PWRB
> ipmi at mainbus0 not configured
> cpu0: Cool'n'Quiet K8 2211 MHz: speeds: 2200 2000 1800 1000 MHz
> pci0 at mainbus0 bus 0: configuration mode 1
> "NVIDIA nForce4 DDR" rev 0xa3 at pci0 dev 0 function 0 not configured
> pcib0 at pci0 dev 1 function 0 "NVIDIA nForce4 ISA" rev 0xa3
> nviic0 at pci0 dev 1 function 1 "NVIDIA nForce4 SMBus" rev 0xa2
> iic0 at nviic0
> adt0 at iic0 addr 0x2e: sch5017 rev 0x89
> spdmem0 at iic0 addr 0x50: 1GB DDR SDRAM ECC PC3200CL3.0
> spdmem1 at iic0 addr 0x51: 1GB DDR SDRAM ECC PC3200CL3.0
> iic1 at nviic0
> adt1 at iic1 addr 0x2e: sch5017 rev 0x89
> spdmem2 at iic1 addr 0x50: 1GB DDR SDRAM ECC PC3200CL3.0
> spdmem3 at iic1 addr 0x51: 1GB DDR SDRAM ECC PC3200CL3.0
> ohci0 at pci0 dev 2 function 0 "NVIDIA nForce4 USB" rev 0xa2: apic 2 
> int 0 (irq 5), version 1.0, legacy support
> ehci0 at pci0 dev 2 function 1 "NVIDIA nForce4 USB" rev 0xa3: apic 2 
> int 0 (irq 10)
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 "NVIDIA EHCI root hub" rev 2.00/1.00 addr 1
> pciide0 at pci0 dev 6 function 0 "NVIDIA nForce4 IDE" rev 0xf2: DMA, 
> channel 0 configured to compatibility, channel 1 configured to 
> compatibility
> atapiscsi0 at pciide0 channel 0 drive 0
> scsibus0 at atapiscsi0: 2 targets
> cd0 at scsibus0 targ 0 lun 0: <MATSHITA, DVD-ROM SR-8178, PZ16> SCSI0 
> 5/cdrom removable
> cd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 4
> pciide0: channel 1 disabled (no drives)
> pciide1 at pci0 dev 7 function 0 "NVIDIA nForce4 SATA" rev 0xf3: DMA
> pciide1: using apic 2 int 0 (irq 11) for native-PCI interrupt
> wd0 at pciide1 channel 0 drive 0: <ST3250823AS>
> wd0: 16-sector PIO, LBA48, 238475MB, 488397168 sectors
> wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5
> pciide2 at pci0 dev 8 function 0 "NVIDIA nForce4 SATA" rev 0xf3: DMA
> pciide2: using apic 2 int 0 (irq 10) for native-PCI interrupt
> ppb0 at pci0 dev 9 function 0 "NVIDIA nForce4 PCI-PCI" rev 0xa2
> pci1 at ppb0 bus 1
> vga1 at pci1 dev 5 function 0 "ATI Rage XL" rev 0x27
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> nfe0 at pci0 dev 10 function 0 "NVIDIA CK804 LAN" rev 0xa3: apic 2 int 
> 0 (irq 11), address 00:e0:81:5d:bd:e3
> eephy0 at nfe0 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
> ppb1 at pci0 dev 11 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
> pci2 at ppb1 bus 2
> ppb2 at pci0 dev 12 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
> pci3 at ppb2 bus 3
> ppb3 at pci0 dev 13 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
> pci4 at ppb3 bus 4
> bge0 at pci4 dev 0 function 0 "Broadcom BCM5721" rev 0x11, BCM5750 B1 
> (0x4101): apic 2 int 5 (irq 5), address 00:e0:81:5d:bd:e4
> brgphy0 at bge0 phy 1: BCM5750 10/100/1000baseT PHY, rev. 0
> ppb4 at pci0 dev 14 function 0 "NVIDIA nForce4 PCIE" rev 0xa3
> pci5 at ppb4 bus 5
> em0 at pci5 dev 0 function 0 "Intel PRO/1000 PT (82571EB)" rev 0x06: 
> apic 2 int 7 (irq 7), address 00:15:17:0d:07:b0
> em1 at pci5 dev 0 function 1 "Intel PRO/1000 PT (82571EB)" rev 0x06: 
> apic 2 int 5 (irq 5), address 00:15:17:0d:07:b1
> pchb0 at pci0 dev 24 function 0 "AMD AMD64 HyperTransport" rev 0x00
> pchb1 at pci0 dev 24 function 1 "AMD AMD64 Address Map" rev 0x00
> pchb2 at pci0 dev 24 function 2 "AMD AMD64 DRAM Cfg" rev 0x00
> pchb3 at pci0 dev 24 function 3 "AMD AMD64 Misc Cfg" rev 0x00
> isa0 at pcib0
> isadma0 at isa0
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> pckbc0 at isa0 port 0x60/5
> pckbd0 at pckbc0 (kbd slot)
> pckbc0: using irq 1 for kbd slot
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pcppi0 at isa0 port 0x61
> midi0 at pcppi0: <PC speaker>
> spkr0 at pcppi0
> usb1 at ohci0: USB revision 1.0
> uhub1 at usb1 "NVIDIA OHCI root hub" rev 1.00/1.00 addr 1
> uhidev0 at uhub0 port 1 configuration 1 interface 0 "Avocent Virtual 
> Media KVM Module" rev 2.00/1.00 addr 2
> uhidev0: iclass 3/1
> ukbd0 at uhidev0: 8 modifier keys, 6 key codes, country code 33
> wskbd1 at ukbd0 mux 1
> wskbd1: connecting to wsdisplay0
> uhidev1 at uhub0 port 1 configuration 1 interface 1 "Avocent Virtual 
> Media KVM Module" rev 2.00/1.00 addr 2
> uhidev1: iclass 3/1, 3 report ids
> ums0 at uhidev1 reportid 1: 5 buttons and Z dir.
> wsmouse0 at ums0 mux 0
> uhid0 at uhidev1 reportid 2: input=2, output=0, feature=0
> uhid1 at uhidev1 reportid 3: input=1, output=0, feature=0
> umass0 at uhub0 port 1 configuration 1 interface 2 "Avocent Virtual 
> Media KVM Module" rev 2.00/1.00 addr 2
> umass0: using SCSI over Bulk-Only
> scsibus1 at umass0: 2 targets
> cd1 at scsibus1 targ 1 lun 0: <KVM, vmDisk-CD, 0.01> SCSI0 5/cdrom 
> removable
> softraid0 at root
> root on wd0a swap on wd0b dump on wd0b
>
>
>
> I appreciate any input you might have on how to further troubleshoot 
> this problem!
>
>
> Thanks,
>
> -Mike
>
Hi Mike,

Did you ever resolve this problem? I'm getting very similar symptoms 
with an openBSD 4.2 setup. I occasionally get the following:

*firewall1*

Aug 22 10:32:36 firewall1 ospfd[15383]: nbr_fsm: neighbor ID 10.0.0.156, 
event LOADING_DONE not expected in state EXCHG
Aug 22 10:32:36 firewall1 ospfd[15383]: nbr_fsm: neighbor ID 10.0.0.157, 
event LOADING_DONE not expected in state EXCHG
Aug 22 10:33:36 firewall1 ospfd[15383]: nbr_adj_timer: failed to form 
adjacency with 10.0.0.159
Aug 22 10:34:36 firewall1 ospfd[15383]: nbr_adj_timer: failed to form 
adjacency with 10.0.0.159
Aug 22 10:36:36 firewall1 last message repeated 2 times
Aug 22 10:40:36 firewall1 last message repeated 4 times
Aug 22 10:40:41 firewall1 ospfd[16136]: nbr_fsm: neighbor ID 10.0.0.157, 
event BAD_LS_REQ not expected in state EXSTA
Aug 22 10:40:41 firewall1 ospfd[16136]: nbr_fsm: neighbor ID 10.0.0.156, 
event LOADING_DONE not expected in state EXCHG
Aug 22 10:55:07 firewall1 ospfd[16136]: nbr_adj_timer: failed to form 
adjacency with 10.0.0.159
Aug 22 10:56:07 firewall1 ospfd[16136]: nbr_adj_timer: failed to form 
adjacency with 10.0.0.159
Aug 22 10:58:07 firewall1 last message repeated 2 times
Aug 22 11:08:07 firewall1 last message repeated 10 times
Aug 22 11:18:07 firewall1 last message repeated 10 times
Aug 22 11:28:07 firewall1 last message repeated 10 times
Aug 22 11:38:07 firewall1 last message repeated 10 times


*firewall2*

Aug 22 10:32:24 firewall2 ospfd[12791]: recv_db_description: seq num 
mismatch, bad flags


OSPF Neighbour table:

[EMAIL PROTECTED](10.0.0.158):/root>ospfctl show neigh
ID              Pri State        DeadTime Address         Iface     Uptime
10.0.0.159    1   2-WAY/OTHER  00:00:01 10.0.0.214    em7       -  
10.0.0.249    128 FULL/DR      00:00:01 10.0.0.210    em7       00:30:49
10.0.0.248    128 FULL/BCKUP   00:00:01 10.0.0.209    em7       00:30:47
*10.0.0.159    1   EXSTA/BCKUP  00:00:01 10.0.0.44     em0       -   *
10.0.0.156    128 2-WAY/OTHER  00:00:01 10.0.0.41     em0       -  
10.0.0.157    128 FULL/DR      00:00:01 10.0.0.42     em0       00:17:25

[EMAIL PROTECTED](10.0.0.159):/root>ospfctl show neigh
ID              Pri State        DeadTime Address         Iface     Uptime
10.0.0.158    1   2-WAY/OTHER  00:00:01 10.0.0.213    em7       -  
10.0.0.248    128 FULL/BCKUP   00:00:01 10.0.0.209    em7       00:39:36
10.0.0.249    128 FULL/DR      00:00:01 10.0.0.210    em7       01:41:29
10.0.0.158    1   2-WAY/OTHER  00:00:01 10.0.0.43     em0       -
10.0.0.157    128 FULL/DR      00:00:01 10.0.0.42     em0       00:39:36
10.0.0.156    128 FULL/BCKUP   00:00:01 10.0.0.41     em0       07:48:31


Then one of the firewalls changes the openbsd peer on one interface to 
state EXSTA/BCKUP and never recovers. Other strange things: previously I 
was running this on an ESX server as VMs and I was getting this, which 
would result in all the routes being lost as you described (as a test - 
see post: "ospf unexpectedly changing to EXSTA"). However, now with 
physical machines, I get the same log error messages and it changes to 
this state, though no routes are lost. I can't see what is triggering 
it, it seems entirely random at the moment.

Any ideas, or any pointers on what I can do to debug this?
Cheers,

Cliff.

ospf config for firewall 1:

router-id 10.0.0.158
hello-interval 1
metric 10
retransmit-interval 5
router-dead-time 2

# areas
area 0.0.0.1 {
        interface em0 {
        }
        interface lo1 {
                passive
        }      
        interface em7 {
        }
}      


Firewall 2 is the same but with metric 100 instead.

Reply via email to