Hi, Following on from the OSPF issue we were seeing in 5.8, we have built a vagrant lab with a complete replica of our production network in order to test config against 6.6 (latest syspatch applied) and test a number of scenarios.
All in all everything has gone well, and other than some minor config enhancements, everything is fundamentally working. The original issue we had was routes not being advertised beyond the DR, when there were situations like a network blip or restart of the ospf process on another router/firewall. Since moving to 6.6 we have been able to recreate the same situation we have had in production, we do this by doing a "rcctl restart ospfd" on the DR, typically a few times. Eventually other routers start logging as follows: May 4 15:44:19 va-l1-tun ospfd[75371]: lsa_check: bad age May 4 15:44:19 va-l1-tun ospfd[75371]: lsa_check: bad age May 4 15:44:24 va-l1-br-02 ospfd[27625]: lsa_check: bad age May 4 15:44:24 va-l1-br-02 ospfd[27625]: lsa_check: bad age May 4 15:44:24 1 va-l1-tun ospfd[75371]: lsa_check: bad age If we run a tcpdump using tcpdump -i vio0 -s 1500 -w /tmp/ospf.pcap proto ospf, we can then see the ospf hello packets fully in wireshark, but the LS update packets are fragmented so we can not see the full detail or what is being passed from the relevant neighbor. We have tried to increase the verbosity of logging using "ospfctl log verbose", but still we are unsure which lsa update is incorrect. The only way we have found to stop these logs from appearing is to "rcctl restart ospfd" on various boxes until it stops. What we are hoping for help with is diagnosing exactly which record has the lsa_check: bad age, and understanding whether this should in effect clear itself for example. We have looked at the source code, but do not fully understand the flow beyond the check itself in lsa_check. We are wondering if there is something fundamentally wrong with our config, but it is pretty simple. Effectively a set of connected routers in a single area with one of the hops having a backup across the internet with a GRE tunnel. At most we are only ever 3 hops away between a source and destination. We have also on occasion seen "seq num mismatch, bad flags" messages, but these have appeared to clear themselves. Thanks