Looking for some help troubleshooting why OVS is not generating a response for 
my internal router port on a VLAN tenant network.  I’ve dug down as far as I am 
reasonably able to but need a quick boost here.

The ARP request is coming from an external system which is on the appropriate 
VLAN for the tenant network.  East/West traffic is working as expected as I am 
able to communicate successfully with another VM on that vlan tenant network.  
It APPEARS that the flow never gets generated on br-int in the appropriate 
controller.

I am going to walk through my debugging steps starting from the NB database to 
OVS on the appropriate controller that should be generating the ARP response:

In OVN NB, I have a router with a port.  This is my internal gateway interface 
of 192.168.5.1

-----
router 21cd6ac3-4804-4c68-a683-9bba07d97967 
(neutron-5d87debf-cf0b-4fac-ba49-01b7680368aa) (aka vlan_test)
    port lrp-8c020ac1-ae54-4aa7-a143-4440067e9f42
        mac: "fa:16:3e:37:43:b3"
        networks: ["192.168.5.1/24"]
    port lrp-e18d09db-19d8-4362-8252-751e6974ef5e
        mac: "fa:16:3e:37:67:a3"
        networks: ["10.27.14.50/23"]
        gateway chassis: [infra-prod-controller-02 infra-prod-controller-01 
infra-prod-controller-03]
    nat 40605ce2-3f93-4877-ac26-47e4b257fa5f
        external ip: "10.27.14.50"
        logical ip: "192.168.5.0/24"
        type: "snat"
----

The logical switch for the tenant network is connected to a localnet with tag 
1106 and has the external port for my baremetal device and the appropriate 
router port.

----
switch be7c870d-6d9c-471f-8996-e48a551068a0 
(neutron-fcc39c5e-d3d8-4400-b5cf-b10f76d0112b) (aka vlan_test)
    port provnet-5ddf8b20-c849-484a-ad8f-a86a9856c2b2
        type: localnet
        tag: 1106
        addresses: ["unknown"]
    port 1f01c94e-f32f-4e94-b02a-813bb1ad4a47
        addresses: ["unknown"]
    port 85aa9a1e-cb84-4137-97ce-85958a948390
        addresses: ["fa:16:3e:ca:6e:3b 192.168.5.188"]
    port 8c020ac1-ae54-4aa7-a143-4440067e9f42
        type: router
        router-port: lrp-8c020ac1-ae54-4aa7-a143-4440067e9f42
    port 19279d0c-7c9a-498e-a3e5-269933c49df6
        type: localport
        addresses: ["fa:16:3e:4a:14:28 192.168.5.2"]
    port 4c9047c4-a6c7-4b27-9cfa-58b5d30ce964
        type: external
        addresses: ["90:ec:77:32:e6:6e 192.168.5.56"]
----


In OVN SB, I can see that the external port 
(4c9047c4-a6c7-4b27-9cfa-58b5d30ce964) has been scheduled on 
infra-prod-controller-02.  This is important because the ARP response would 
only get generated from a single HA Chassis.

---
Chassis infra-prod-controller-02
    hostname: infra-prod-controller-02
    Encap geneve
        ip: "10.27.12.24"
        options: {csum="true"}
    Port_Binding cr-lrp-20ba6028-7220-4c8d-a20f-9e4c416da3f7
    Port_Binding "71e436bb-7121-473e-a024-e34d4d7f4a4f"
    Port_Binding cr-lrp-c03b5dd9-92e1-4046-be1c-a953c0fab238
    Port_Binding "f8eb9e30-e65f-44c4-94b6-a67700790880"
    Port_Binding cr-lrp-ae2f5dbb-2cd0-44d0-9061-71c8186440be
    Port_Binding "4c9047c4-a6c7-4b27-9cfa-58b5d30ce964"
    Port_Binding "eb4435ad-37f2-44f9-a786-470b18bb9f0d"
    Port_Binding cr-lrp-950eec85-b785-474b-837b-4ecbbcf080c9
    Port_Binding "f3bbbe9a-a1a7-44b5-b6bc-b00a351ca1a5"
    Port_Binding "79430028-7ae3-448c-bc12-c9d7d44d218b"
---

In OVN SB again, I can issue a trace command to verify that the Logical Flow 
exists to generate the ARP response:

---
# ovn-trace neutron-fcc39c5e-d3d8-4400-b5cf-b10f76d0112b 'inport == 
"provnet-5ddf8b20-c849-484a-ad8f-a86a9856c2b2" && eth.src == 90:ec:77:32:e6:6e 
&& eth.dst == ff:ff:ff:ff:ff:ff && arp.tpa == 192.168.5.1 && arp.spa == 
192.168.5
.178 && arp.op == 1 && arp.tha == ff:ff:ff:ff:ff:ff && arp.sha == 
90:ec:77:32:e6:6e'

…

        ingress(dp="vlan_test", inport="lrp-8c020a")
        --------------------------------------------
         0. lr_in_admission (northd.c:12885): eth.mcast && inport == 
"lrp-8c020a", priority 50, uuid faac4787
            xreg0[0..47] = fa:16:3e:37:43:b3;
            next;
         1. lr_in_lookup_neighbor (northd.c:13142): inport == "lrp-8c020a" && 
arp.spa == 192.168.5.0/24 && arp.tpa == 192.168.5.1 && arp.op == 1, priority 
110, uuid d447a962
            reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
            /* MAC binding to ff:ff:ff:ff:ff:ff found. */
            reg9[3] = 1;
            next;
         2. lr_in_learn_neighbor (northd.c:13078): reg9[2] == 1 || reg9[3] == 
0, priority 100, uuid 6aad6f8d
            mac_cache_use;
            next;
         3. lr_in_ip_input (northd.c:12440): inport == "lrp-8c020a" && arp.op 
== 1 && arp.tpa == 192.168.5.1 && arp.spa == 192.168.5.0/24 && 
is_chassis_resident("cr-lrp-e18d09"), priority 90, uuid 6187e537
            eth.dst = eth.src;
            eth.src = xreg0[0..47];
            arp.op = 2;
            arp.tha = arp.sha;
            arp.sha = xreg0[0..47];
            arp.tpa <-> arp.spa;
            outport = inport;
            flags.loopback = 1;
            output;
---

Moving into OVS land on infra-prod-controller-02 where the external port is 
scheduled, I am able to see the ARP requests entering br-ex:

However, there is no response!
---
ovs-tcpdump -i br-ex -nn -e -v
tcpdump: listening on mibr-ex, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
22:05:48.581916 90:ec:77:32:e6:6e > ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 64: vlan 1106, p 0, ethertype ARP (0x0806), Ethernet (len 6), 
IPv4 (len 4), Request who-has 192.168.5.1 (ff:ff:ff:ff:ff:ff) tell 
192.168.5.56, length 46
---


The flow exists on br-ex (with no reply).

---
# ovs-appctl dpif/dump-flows --names br-ex
recirc_id(0),in_port(bond0),ct_state(-new-est-rel-rpl-inv-trk),ct_mark(0/0x1),eth(src=90:ec:77:32:e6:6e,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=1106,pcp=0),encap(eth_type(0x0806),arp(sip=192.168.5.56,tip=192.168.5.1,op=1/0xff,sha=90:ec:77:32:e6:6e)),
 packets:2425, bytes:155200, used:0.633s, 
actions:mibond0,br-ex,pop_vlan,tapfcc39c5e-d0
---



I do NOT see the flow in `br-int`.
---
(openvswitch-vswitchd)[root@infra-prod-controller-02 /]# ovs-appctl 
dpif/dump-flows --names br-int
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.21,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=1e:4d:dd:80:c3:ed),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6468, bytes:426888, used:0.068s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.13,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=9a:b2:a9:fd:a6:5b),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6456, bytes:426096, used:0.860s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.22,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=66:90:cc:1e:a5:b6),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6452, bytes:425832, used:0.716s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.12,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=96:54:aa:d3:29:b9),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6453, bytes:425898, used:0.729s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.15,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=8e:0b:17:26:df:ef),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6469, bytes:426954, used:0.609s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.19,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=5a:56:62:47:6f:21),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6457, bytes:426162, used:0.265s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.11,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=ae:36:b4:02:1c:67),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6462, bytes:426492, used:0.253s, 
actions:userspace(pid=4294967295,slow_path(bfd))
---

I confirmed that tapfcc39c5e-d0 is indeed in the `br-int` Bridge.  It is not 
marked internal for some reason, but unclear if that is by design.

---
# ovs-vsctl show
678a6be7-51d4-44d4-9366-67311227cdbb
    Bridge br-int
        fail_mode: secure
        datapath_type: system
…
        Port tapfcc39c5e-d0
            Interface tapfcc39c5e-d0
---

When I issue the ofproto/trace commands, it appears that the OVS IS generating 
response, but it is definitely not being sent back out of the system:

---
Trace ARP Request entering br-ex
# ovs-appctl ofproto/trace --names br-ex 
in_port=1,dl_vlan=1106,dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0806,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e,arp_tha=ff:ff:ff:ff:ff:ff
…
Final flow: unchanged
Megaflow: 
recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_mark=0/0x1,eth,arp,in_port=bond0,dl_vlan=1106,dl_vlan_pcp=0,dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e
Datapath actions: mibond0,br-ex,pop_vlan,tapfcc39c5e-d0
---

---
Trace ARP Request entering br-int
# ovs-appctl ofproto/trace --names br-int 
in_port=tapfcc39c5e-d0,dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0806,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e,arp_tha=ff:ff:ff:ff:ff:ff
…
Final flow: 
arp,reg0=0x300,reg10=0x401,reg11=0x2d,reg12=0x2c,reg13=0x28,reg14=0x2,reg15=0x2,metadata=0x16,in_port="tapfcc39c5e-d0",vlan_tci=0x0000,dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_spa=192.168.5.1,arp_tpa=192.168.5.56,arp_op=2,arp_sha=fa:16:3e:37:43:b3,arp_tha=90:ec:77:32:e6:6e
Megaflow: 
recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_mark=0/0x1,eth,arp,in_port="tapfcc39c5e-d0",dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e,arp_tha=ff:ff:ff:ff:ff:ff
Datapath actions: 
set(eth(src=fa:16:3e:37:43:b3,dst=90:ec:77:32:e6:6e)),set(arp(sip=192.168.5.1,tip=192.168.5.56,op=2,sha=fa:16:3e:37:43:b3,tha=90:ec:77:32:e6:6e)),tapfcc39c5e-d0
This flow is handled by the userspace slow path because it:
  - Uses action(s) not supported by datapath.
---


At this point, I would expect to see these responses actually hitting 
tapfcc39c5e-d0, but they do not seem to be present.  It’s not clear if the 
above “Uses action(s) not supported by datapath” is an issue yet.

---
# ovs-tcpdump -i tapfcc39c5e-d0 -nn -v -e
tcpdump: listening on ovsmi100537, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
22:25:06.677218 90:ec:77:32:e6:6e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.5.1 
(ff:ff:ff:ff:ff:ff) tell 192.168.5.56, length 46
---


If the packets WERE to hit tapfcc39c5e-d0, it appears that the system would 
send the response out bond0 as expected:
---
# ovs-appctl ofproto/trace --names br-int 
in_port=tapfcc39c5e-d0,dl_type=0x0806,dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_spa=192.168.5.1,arp_tpa=192.168.5.56,arp_op=2,arp_sha=fa:16:3e:37:43:b3,arp_tha=90:ec:77:32:e6:6e
…
Final flow: 
arp,reg0=0x300,reg10=0x400,reg11=0x2d,reg12=0x2c,reg13=0x1b,reg14=0x2,reg15=0x8001,metadata=0x16,in_port="tapfcc39c5e-d0",vlan_tci=0x0000,dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_spa=192.168.5.1,arp_tpa=192.168.5.56,arp_op=2,arp_sha=fa:16:3e:37:43:b3,arp_tha=90:ec:77:32:e6:6e
Megaflow: 
recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_mark=0/0x1,eth,arp,in_port="tapfcc39c5e-d0",dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_tpa=192.168.5.56,arp_op=2
Datapath actions: push_vlan(vid=1106,pcp=0),bond0,mibond0
---

Any insight on what could be happening here or how to debug further would be 
GREATLY appreciated.

-Austin



Juniper Business Use Only
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to