After investigating further, I believe I am hitting the following issue:

https://bugs.launchpad.net/neutron/+bug/1995078

Essentially the external port and the LRP are being scheduled separately and 
without coordination.  Because of this, if these ports are scheduled on 
different chassis the ARP request is dropped.  Will need to build/test this fix 
and will follow up with a conclusion.



Juniper Business Use Only
From: Austin Cormier <acorm...@juniper.net>
Date: Thursday, February 1, 2024 at 5:39 PM
To: ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org>
Subject: OVN/OVS no ARP response for internal router interface from external 
port
Looking for some help troubleshooting why OVS is not generating a response for 
my internal router port on a VLAN tenant network.  I’ve dug down as far as I am 
reasonably able to but need a quick boost here.

The ARP request is coming from an external system which is on the appropriate 
VLAN for the tenant network.  East/West traffic is working as expected as I am 
able to communicate successfully with another VM on that vlan tenant network.  
It APPEARS that the flow never gets generated on br-int in the appropriate 
controller.

I am going to walk through my debugging steps starting from the NB database to 
OVS on the appropriate controller that should be generating the ARP response:

In OVN NB, I have a router with a port.  This is my internal gateway interface 
of 192.168.5.1

-----
router 21cd6ac3-4804-4c68-a683-9bba07d97967 
(neutron-5d87debf-cf0b-4fac-ba49-01b7680368aa) (aka vlan_test)
    port lrp-8c020ac1-ae54-4aa7-a143-4440067e9f42
        mac: "fa:16:3e:37:43:b3"
        networks: ["192.168.5.1/24"]
    port lrp-e18d09db-19d8-4362-8252-751e6974ef5e
        mac: "fa:16:3e:37:67:a3"
        networks: ["10.27.14.50/23"]
        gateway chassis: [infra-prod-controller-02 infra-prod-controller-01 
infra-prod-controller-03]
    nat 40605ce2-3f93-4877-ac26-47e4b257fa5f
        external ip: "10.27.14.50"
        logical ip: "192.168.5.0/24"
        type: "snat"
----

The logical switch for the tenant network is connected to a localnet with tag 
1106 and has the external port for my baremetal device and the appropriate 
router port.

----
switch be7c870d-6d9c-471f-8996-e48a551068a0 
(neutron-fcc39c5e-d3d8-4400-b5cf-b10f76d0112b) (aka vlan_test)
    port provnet-5ddf8b20-c849-484a-ad8f-a86a9856c2b2
        type: localnet
        tag: 1106
        addresses: ["unknown"]
    port 1f01c94e-f32f-4e94-b02a-813bb1ad4a47
        addresses: ["unknown"]
    port 85aa9a1e-cb84-4137-97ce-85958a948390
        addresses: ["fa:16:3e:ca:6e:3b 192.168.5.188"]
    port 8c020ac1-ae54-4aa7-a143-4440067e9f42
        type: router
        router-port: lrp-8c020ac1-ae54-4aa7-a143-4440067e9f42
    port 19279d0c-7c9a-498e-a3e5-269933c49df6
        type: localport
        addresses: ["fa:16:3e:4a:14:28 192.168.5.2"]
    port 4c9047c4-a6c7-4b27-9cfa-58b5d30ce964
        type: external
        addresses: ["90:ec:77:32:e6:6e 192.168.5.56"]
----


In OVN SB, I can see that the external port 
(4c9047c4-a6c7-4b27-9cfa-58b5d30ce964) has been scheduled on 
infra-prod-controller-02.  This is important because the ARP response would 
only get generated from a single HA Chassis.

---
Chassis infra-prod-controller-02
    hostname: infra-prod-controller-02
    Encap geneve
        ip: "10.27.12.24"
        options: {csum="true"}
    Port_Binding cr-lrp-20ba6028-7220-4c8d-a20f-9e4c416da3f7
    Port_Binding "71e436bb-7121-473e-a024-e34d4d7f4a4f"
    Port_Binding cr-lrp-c03b5dd9-92e1-4046-be1c-a953c0fab238
    Port_Binding "f8eb9e30-e65f-44c4-94b6-a67700790880"
    Port_Binding cr-lrp-ae2f5dbb-2cd0-44d0-9061-71c8186440be
    Port_Binding "4c9047c4-a6c7-4b27-9cfa-58b5d30ce964"
    Port_Binding "eb4435ad-37f2-44f9-a786-470b18bb9f0d"
    Port_Binding cr-lrp-950eec85-b785-474b-837b-4ecbbcf080c9
    Port_Binding "f3bbbe9a-a1a7-44b5-b6bc-b00a351ca1a5"
    Port_Binding "79430028-7ae3-448c-bc12-c9d7d44d218b"
---

In OVN SB again, I can issue a trace command to verify that the Logical Flow 
exists to generate the ARP response:

---
# ovn-trace neutron-fcc39c5e-d3d8-4400-b5cf-b10f76d0112b 'inport == 
"provnet-5ddf8b20-c849-484a-ad8f-a86a9856c2b2" && eth.src == 90:ec:77:32:e6:6e 
&& eth.dst == ff:ff:ff:ff:ff:ff && arp.tpa == 192.168.5.1 && arp.spa == 
192.168.5
.178 && arp.op == 1 && arp.tha == ff:ff:ff:ff:ff:ff && arp.sha == 
90:ec:77:32:e6:6e'

…

        ingress(dp="vlan_test", inport="lrp-8c020a")
        --------------------------------------------
         0. lr_in_admission (northd.c:12885): eth.mcast && inport == 
"lrp-8c020a", priority 50, uuid faac4787
            xreg0[0..47] = fa:16:3e:37:43:b3;
            next;
         1. lr_in_lookup_neighbor (northd.c:13142): inport == "lrp-8c020a" && 
arp.spa == 192.168.5.0/24 && arp.tpa == 192.168.5.1 && arp.op == 1, priority 
110, uuid d447a962
            reg9[2] = lookup_arp(inport, arp.spa, arp.sha);
            /* MAC binding to ff:ff:ff:ff:ff:ff found. */
            reg9[3] = 1;
            next;
         2. lr_in_learn_neighbor (northd.c:13078): reg9[2] == 1 || reg9[3] == 
0, priority 100, uuid 6aad6f8d
            mac_cache_use;
            next;
         3. lr_in_ip_input (northd.c:12440): inport == "lrp-8c020a" && arp.op 
== 1 && arp.tpa == 192.168.5.1 && arp.spa == 192.168.5.0/24 && 
is_chassis_resident("cr-lrp-e18d09"), priority 90, uuid 6187e537
            eth.dst = eth.src;
            eth.src = xreg0[0..47];
            arp.op = 2;
            arp.tha = arp.sha;
            arp.sha = xreg0[0..47];
            arp.tpa <-> arp.spa;
            outport = inport;
            flags.loopback = 1;
            output;
---

Moving into OVS land on infra-prod-controller-02 where the external port is 
scheduled, I am able to see the ARP requests entering br-ex:

However, there is no response!
---
ovs-tcpdump -i br-ex -nn -e -v
tcpdump: listening on mibr-ex, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
22:05:48.581916 90:ec:77:32:e6:6e > ff:ff:ff:ff:ff:ff, ethertype 802.1Q 
(0x8100), length 64: vlan 1106, p 0, ethertype ARP (0x0806), Ethernet (len 6), 
IPv4 (len 4), Request who-has 192.168.5.1 (ff:ff:ff:ff:ff:ff) tell 
192.168.5.56, length 46
---


The flow exists on br-ex (with no reply).

---
# ovs-appctl dpif/dump-flows --names br-ex
recirc_id(0),in_port(bond0),ct_state(-new-est-rel-rpl-inv-trk),ct_mark(0/0x1),eth(src=90:ec:77:32:e6:6e,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=1106,pcp=0),encap(eth_type(0x0806),arp(sip=192.168.5.56,tip=192.168.5.1,op=1/0xff,sha=90:ec:77:32:e6:6e)),
 packets:2425, bytes:155200, used:0.633s, 
actions:mibond0,br-ex,pop_vlan,tapfcc39c5e-d0
---



I do NOT see the flow in `br-int`.
---
(openvswitch-vswitchd)[root@infra-prod-controller-02 /]# ovs-appctl 
dpif/dump-flows --names br-int
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.21,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=1e:4d:dd:80:c3:ed),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6468, bytes:426888, used:0.068s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.13,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=9a:b2:a9:fd:a6:5b),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6456, bytes:426096, used:0.860s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.22,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=66:90:cc:1e:a5:b6),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6452, bytes:425832, used:0.716s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.12,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=96:54:aa:d3:29:b9),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6453, bytes:425898, used:0.729s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.15,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=8e:0b:17:26:df:ef),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6469, bytes:426954, used:0.609s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.19,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=5a:56:62:47:6f:21),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6457, bytes:426162, used:0.265s, 
actions:userspace(pid=4294967295,slow_path(bfd))
recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.11,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=ae:36:b4:02:1c:67),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784),
 packets:6462, bytes:426492, used:0.253s, 
actions:userspace(pid=4294967295,slow_path(bfd))
---

I confirmed that tapfcc39c5e-d0 is indeed in the `br-int` Bridge.  It is not 
marked internal for some reason, but unclear if that is by design.
---
# ovs-vsctl show
678a6be7-51d4-44d4-9366-67311227cdbb
    Bridge br-int
        fail_mode: secure
        datapath_type: system
…
        Port tapfcc39c5e-d0
            Interface tapfcc39c5e-d0
---

When I issue the ofproto/trace commands, it appears that the OVS IS generating 
response, but it is definitely not being sent back out of the system:

---
Trace ARP Request entering br-ex
# ovs-appctl ofproto/trace --names br-ex 
in_port=1,dl_vlan=1106,dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0806,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e,arp_tha=ff:ff:ff:ff:ff:ff
…
Final flow: unchanged
Megaflow: 
recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_mark=0/0x1,eth,arp,in_port=bond0,dl_vlan=1106,dl_vlan_pcp=0,dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e
Datapath actions: mibond0,br-ex,pop_vlan,tapfcc39c5e-d0
---

---
Trace ARP Request entering br-int
# ovs-appctl ofproto/trace --names br-int 
in_port=tapfcc39c5e-d0,dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0806,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e,arp_tha=ff:ff:ff:ff:ff:ff
…
Final flow: 
arp,reg0=0x300,reg10=0x401,reg11=0x2d,reg12=0x2c,reg13=0x28,reg14=0x2,reg15=0x2,metadata=0x16,in_port="tapfcc39c5e-d0",vlan_tci=0x0000,dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_spa=192.168.5.1,arp_tpa=192.168.5.56,arp_op=2,arp_sha=fa:16:3e:37:43:b3,arp_tha=90:ec:77:32:e6:6e
Megaflow: 
recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_mark=0/0x1,eth,arp,in_port="tapfcc39c5e-d0",dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e,arp_tha=ff:ff:ff:ff:ff:ff
Datapath actions: 
set(eth(src=fa:16:3e:37:43:b3,dst=90:ec:77:32:e6:6e)),set(arp(sip=192.168.5.1,tip=192.168.5.56,op=2,sha=fa:16:3e:37:43:b3,tha=90:ec:77:32:e6:6e)),tapfcc39c5e-d0
This flow is handled by the userspace slow path because it:
  - Uses action(s) not supported by datapath.
---


At this point, I would expect to see these responses actually hitting 
tapfcc39c5e-d0, but they do not seem to be present.  It’s not clear if the 
above “Uses action(s) not supported by datapath” is an issue yet.

---
# ovs-tcpdump -i tapfcc39c5e-d0 -nn -v -e
tcpdump: listening on ovsmi100537, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
22:25:06.677218 90:ec:77:32:e6:6e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), 
length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.5.1 
(ff:ff:ff:ff:ff:ff) tell 192.168.5.56, length 46
---


If the packets WERE to hit tapfcc39c5e-d0, it appears that the system would 
send the response out bond0 as expected:
---
# ovs-appctl ofproto/trace --names br-int 
in_port=tapfcc39c5e-d0,dl_type=0x0806,dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_spa=192.168.5.1,arp_tpa=192.168.5.56,arp_op=2,arp_sha=fa:16:3e:37:43:b3,arp_tha=90:ec:77:32:e6:6e
…
Final flow: 
arp,reg0=0x300,reg10=0x400,reg11=0x2d,reg12=0x2c,reg13=0x1b,reg14=0x2,reg15=0x8001,metadata=0x16,in_port="tapfcc39c5e-d0",vlan_tci=0x0000,dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_spa=192.168.5.1,arp_tpa=192.168.5.56,arp_op=2,arp_sha=fa:16:3e:37:43:b3,arp_tha=90:ec:77:32:e6:6e
Megaflow: 
recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_mark=0/0x1,eth,arp,in_port="tapfcc39c5e-d0",dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_tpa=192.168.5.56,arp_op=2
Datapath actions: push_vlan(vid=1106,pcp=0),bond0,mibond0
---

Any insight on what could be happening here or how to debug further would be 
GREATLY appreciated.

-Austin

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to