On 09/03/2017 11:51, Martin Pieuchot wrote:
On 07/03/17(Tue) 19:38, Joe Holden wrote:
On 12/12/2016 16:55, Joe Holden wrote:
On 12/12/2016 10:27, Martin Pieuchot wrote:
On 11/12/16(Sun) 00:50, Joe Holden wrote:
On 10/12/2016 08:43, Mihai Popescu wrote:
seeing some bizarre behaviour on one box, on one specific interface:

Hello,

This looks like some stupid TV game, where contesters are given some
clues from time to time and they have to guess what is the real shit.

Do post your FULL dmesg and configurations for network if you really
want someone to even think at your issue. Isn't that obvious?

Bye!


Appreciate the useless response (but still better than nothing!), the
affected box has since been reverted to older snapshot and thus no more
debugging can be done - someone else will have to do it.

I'd appreciate to see the output of 'netstat -rnf inet' when it is
relevant.  Without that information it's hard to understand.

But there's a bug somewhere, it has to be fixed.

Not that dmesg is even relevant since it is a userland bug not a kernel
problem but anyway:

It's a kernel problem.

I'll see if I can recreate it but I'm not holding my breath - it only
breaks once BGP loaded the table which leads me to thing it is actually
bgpd that is updating the llinfo with bogus info and even though I have
a feed in my lab it doesn't do the same thing.

Ok so, inadvertantly recreated this (pretty much exactly the same) issue on
a lab/test setup:

For the purposes of debug, ignore the fact that the interfaces are tap
interfaces, they're still emulated ethernet...

Wall of text incoming, various info...

box#1:

tap1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr fe:e1:ba:d1:be:f3
        index 7 priority 0 llprio 3
        groups: tap
        status: active
        inet 172.20.230.72 netmask 0xfffffffe

box#2:

tap1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        lladdr fe:e1:ba:d1:cf:92
        index 7 priority 0 llprio 3
        groups: tap
        status: active
        inet 172.20.230.73 netmask 0xfffffffe

All is fine after starting ospfd, but as soon as I start bgpd, box#2 shows
the following:

Host                                 Ethernet Address    Netif Expire Flags
172.20.230.72                        00:00:00:00:20:12       ? 12m30s

# route -n get 172.20.230.72
   route to: 172.20.230.72
destination: 172.20.230.72
       mask: 255.255.255.255
  interface: tap1
 if address: 172.20.230.73
   priority: 3 ()
      flags: <UP,HOST,DONE,LLINFO,CLONED,CACHED>
     use       mtu    expire
      20         0       702

flags destination          gateway          lpref   med aspath origin
IS*>  172.20.230.72/31     172.20.230.64      200     0 i

.64 is the loopback on one of its connected boxes that doesn't have broken
entries

tcpdump looks ok, afterwards:

19:14:23.723876 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:23.901883 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3
19:14:24.022948 arp who-has 172.20.230.72 tell 172.20.230.73
19:14:24.201095 arp reply 172.20.230.72 is-at fe:e1:ba:d1:be:f3

but the correct entry is never installed, after I delete the broken arp
entry it never readds a new one.

This only happens with redist connected as far as I can tell, but bgpd
probably shouldn't be able to mangle arp entries and prevent the correct one
being added.

Here's the fix.

Index: net/rtsock.c
===================================================================
RCS file: /cvs/src/sys/net/rtsock.c,v
retrieving revision 1.232
diff -u -p -r1.232 rtsock.c
--- net/rtsock.c        7 Mar 2017 09:23:27 -0000       1.232
+++ net/rtsock.c        8 Mar 2017 16:06:22 -0000
@@ -895,10 +895,22 @@ rtm_output(struct rt_msghdr *rtm, struct
                                }
                        }
 change:
-                       if (info->rti_info[RTAX_GATEWAY] != NULL && (error =
-                           rt_setgate(rt, info->rti_info[RTAX_GATEWAY],
-                           tableid)))
-                               break;
+                       if (info->rti_info[RTAX_GATEWAY] != NULL) {
+                               /*
+                                * When updating the gateway, make sure it's
+                                * valid.
+                                */
+                               if (!newgate && rt->rt_gateway->sa_family !=
+                                   info->rti_info[RTAX_GATEWAY]->sa_family) {
+                                       error = EINVAL;
+                                       break;
+                               }
+
+                               error = rt_setgate(rt,
+                                   info->rti_info[RTAX_GATEWAY], tableid);
+                               if (error)
+                                       break;
+                       }
 #ifdef MPLS
                        if ((rtm->rtm_flags & RTF_MPLS) &&
                            info->rti_info[RTAX_SRC] != NULL) {

Looking good - have tried to break it since and it's fine, thanks for your help!

Will this make it into 6.1?

Reply via email to