Joe Holden(m...@m.jwh.me.uk) on 2017.03.09 13:41:26 +0000: > On 09/03/2017 11:51, Martin Pieuchot wrote: > >On 07/03/17(Tue) 19:38, Joe Holden wrote: > >>On 12/12/2016 16:55, Joe Holden wrote: > >>>On 12/12/2016 10:27, Martin Pieuchot wrote: > >>>>On 11/12/16(Sun) 00:50, Joe Holden wrote: > >>>>>On 10/12/2016 08:43, Mihai Popescu wrote: > >>>>>>>>seeing some bizarre behaviour on one box, on one specific interface: > >>>>>> > >>>>>>Hello, > >>>>>> > >>>>>>This looks like some stupid TV game, where contesters are given some > >>>>>>clues from time to time and they have to guess what is the real shit. > >>>>>> > >>>>>>Do post your FULL dmesg and configurations for network if you really > >>>>>>want someone to even think at your issue. Isn't that obvious? > >>>>>> > >>>>>>Bye! > >>>>>> > >>>>> > >>>>>Appreciate the useless response (but still better than nothing!), the > >>>>>affected box has since been reverted to older snapshot and thus no more > >>>>>debugging can be done - someone else will have to do it. > >>>> > >>>>I'd appreciate to see the output of 'netstat -rnf inet' when it is > >>>>relevant. Without that information it's hard to understand. > >>>> > >>>>But there's a bug somewhere, it has to be fixed. > >>>> > >>>>>Not that dmesg is even relevant since it is a userland bug not a kernel > >>>>>problem but anyway: > >>>> > >>>>It's a kernel problem. > >>>> > >>>I'll see if I can recreate it but I'm not holding my breath - it only > >>>breaks once BGP loaded the table which leads me to thing it is actually > >>>bgpd that is updating the llinfo with bogus info and even though I have > >>>a feed in my lab it doesn't do the same thing. > >>> > >>Ok so, inadvertantly recreated this (pretty much exactly the same) issue > >>on > >>a lab/test setup: > >> > >>For the purposes of debug, ignore the fact that the interfaces are tap > >>interfaces, they're still emulated ethernet... > >> > >>Wall of text incoming, various info... > >> > >>box#1: > >> > >>tap1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > >> lladdr fe:e1:ba:d1:be:f3 > >> index 7 priority 0 llprio 3 > >> groups: tap > >> status: active > >> inet 172.20.230.72 netmask 0xfffffffe > >> > >>box#2: > >> > >>tap1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 > >> lladdr fe:e1:ba:d1:cf:92 > >> index 7 priority 0 llprio 3 > >> groups: tap > >> status: active > >> inet 172.20.230.73 netmask 0xfffffffe > >> > >>All is fine after starting ospfd, but as soon as I start bgpd, box#2 shows > >>the following: > >> > >>Host Ethernet Address Netif Expire > >>Flags > >>172.20.230.72 00:00:00:00:20:12 ? 12m30s > >> > >># route -n get 172.20.230.72 > >> route to: 172.20.230.72 > >>destination: 172.20.230.72 > >> mask: 255.255.255.255 > >> interface: tap1 > >> if address: 172.20.230.73 > >> priority: 3 () > >> flags: <UP,HOST,DONE,LLINFO,CLONED,CACHED> > >> use mtu expire > >> 20 0 702 > >> > >>flags destination gateway lpref med aspath origin > >>IS*> 172.20.230.72/31 172.20.230.64 200 0 i > >> > >>.64 is the loopback on one of its connected boxes that doesn't have broken > >>entries > >> > >>tcpdump looks ok, afterwards: > >> > >>19:14:23.723876 arp who-has 172.20.230.72 tell 172.20.230.73 > >>19:14:23.901883 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3 > >>19:14:24.022948 arp who-has 172.20.230.72 tell 172.20.230.73 > >>19:14:24.201095 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3 > >> > >>but the correct entry is never installed, after I delete the broken arp > >>entry it never readds a new one. > >> > >>This only happens with redist connected as far as I can tell, but bgpd > >>probably shouldn't be able to mangle arp entries and prevent the correct > >>one > >>being added. > > > >Here's the fix. > > > >Index: net/rtsock.c > >=================================================================== > >RCS file: /cvs/src/sys/net/rtsock.c,v > >retrieving revision 1.232 > >diff -u -p -r1.232 rtsock.c > >--- net/rtsock.c 7 Mar 2017 09:23:27 -0000 1.232 > >+++ net/rtsock.c 8 Mar 2017 16:06:22 -0000 > >@@ -895,10 +895,22 @@ rtm_output(struct rt_msghdr *rtm, struct > > } > > } > > change: > >- if (info->rti_info[RTAX_GATEWAY] != NULL && (error = > >- rt_setgate(rt, info->rti_info[RTAX_GATEWAY], > >- tableid))) > >- break; > >+ if (info->rti_info[RTAX_GATEWAY] != NULL) { > >+ /* > >+ * When updating the gateway, make sure it's > >+ * valid. > >+ */ > >+ if (!newgate && rt->rt_gateway->sa_family != > >+ info->rti_info[RTAX_GATEWAY]->sa_family) > >{ > >+ error = EINVAL; > >+ break; > >+ } > >+ > >+ error = rt_setgate(rt, > >+ info->rti_info[RTAX_GATEWAY], tableid); > >+ if (error) > >+ break; > >+ } > > #ifdef MPLS > > if ((rtm->rtm_flags & RTF_MPLS) && > > info->rti_info[RTAX_SRC] != NULL) { > > > Looking good - have tried to break it since and it's fine, thanks for > your help! > > Will this make it into 6.1?
I'm trying to understand the bgpd side of this, but cannot reproduce it. Can you send me (privately if you want) your configuration files, bot bgpd.conf and ospfd.conf + ifconfig output? Thanks.