Looking for some help troubleshooting why OVS is not generating a response for my internal router port on a VLAN tenant network. I’ve dug down as far as I am reasonably able to but need a quick boost here.
The ARP request is coming from an external system which is on the appropriate VLAN for the tenant network. East/West traffic is working as expected as I am able to communicate successfully with another VM on that vlan tenant network. It APPEARS that the flow never gets generated on br-int in the appropriate controller. I am going to walk through my debugging steps starting from the NB database to OVS on the appropriate controller that should be generating the ARP response: In OVN NB, I have a router with a port. This is my internal gateway interface of 192.168.5.1 ----- router 21cd6ac3-4804-4c68-a683-9bba07d97967 (neutron-5d87debf-cf0b-4fac-ba49-01b7680368aa) (aka vlan_test) port lrp-8c020ac1-ae54-4aa7-a143-4440067e9f42 mac: "fa:16:3e:37:43:b3" networks: ["192.168.5.1/24"] port lrp-e18d09db-19d8-4362-8252-751e6974ef5e mac: "fa:16:3e:37:67:a3" networks: ["10.27.14.50/23"] gateway chassis: [infra-prod-controller-02 infra-prod-controller-01 infra-prod-controller-03] nat 40605ce2-3f93-4877-ac26-47e4b257fa5f external ip: "10.27.14.50" logical ip: "192.168.5.0/24" type: "snat" ---- The logical switch for the tenant network is connected to a localnet with tag 1106 and has the external port for my baremetal device and the appropriate router port. ---- switch be7c870d-6d9c-471f-8996-e48a551068a0 (neutron-fcc39c5e-d3d8-4400-b5cf-b10f76d0112b) (aka vlan_test) port provnet-5ddf8b20-c849-484a-ad8f-a86a9856c2b2 type: localnet tag: 1106 addresses: ["unknown"] port 1f01c94e-f32f-4e94-b02a-813bb1ad4a47 addresses: ["unknown"] port 85aa9a1e-cb84-4137-97ce-85958a948390 addresses: ["fa:16:3e:ca:6e:3b 192.168.5.188"] port 8c020ac1-ae54-4aa7-a143-4440067e9f42 type: router router-port: lrp-8c020ac1-ae54-4aa7-a143-4440067e9f42 port 19279d0c-7c9a-498e-a3e5-269933c49df6 type: localport addresses: ["fa:16:3e:4a:14:28 192.168.5.2"] port 4c9047c4-a6c7-4b27-9cfa-58b5d30ce964 type: external addresses: ["90:ec:77:32:e6:6e 192.168.5.56"] ---- In OVN SB, I can see that the external port (4c9047c4-a6c7-4b27-9cfa-58b5d30ce964) has been scheduled on infra-prod-controller-02. This is important because the ARP response would only get generated from a single HA Chassis. --- Chassis infra-prod-controller-02 hostname: infra-prod-controller-02 Encap geneve ip: "10.27.12.24" options: {csum="true"} Port_Binding cr-lrp-20ba6028-7220-4c8d-a20f-9e4c416da3f7 Port_Binding "71e436bb-7121-473e-a024-e34d4d7f4a4f" Port_Binding cr-lrp-c03b5dd9-92e1-4046-be1c-a953c0fab238 Port_Binding "f8eb9e30-e65f-44c4-94b6-a67700790880" Port_Binding cr-lrp-ae2f5dbb-2cd0-44d0-9061-71c8186440be Port_Binding "4c9047c4-a6c7-4b27-9cfa-58b5d30ce964" Port_Binding "eb4435ad-37f2-44f9-a786-470b18bb9f0d" Port_Binding cr-lrp-950eec85-b785-474b-837b-4ecbbcf080c9 Port_Binding "f3bbbe9a-a1a7-44b5-b6bc-b00a351ca1a5" Port_Binding "79430028-7ae3-448c-bc12-c9d7d44d218b" --- In OVN SB again, I can issue a trace command to verify that the Logical Flow exists to generate the ARP response: --- # ovn-trace neutron-fcc39c5e-d3d8-4400-b5cf-b10f76d0112b 'inport == "provnet-5ddf8b20-c849-484a-ad8f-a86a9856c2b2" && eth.src == 90:ec:77:32:e6:6e && eth.dst == ff:ff:ff:ff:ff:ff && arp.tpa == 192.168.5.1 && arp.spa == 192.168.5 .178 && arp.op == 1 && arp.tha == ff:ff:ff:ff:ff:ff && arp.sha == 90:ec:77:32:e6:6e' … ingress(dp="vlan_test", inport="lrp-8c020a") -------------------------------------------- 0. lr_in_admission (northd.c:12885): eth.mcast && inport == "lrp-8c020a", priority 50, uuid faac4787 xreg0[0..47] = fa:16:3e:37:43:b3; next; 1. lr_in_lookup_neighbor (northd.c:13142): inport == "lrp-8c020a" && arp.spa == 192.168.5.0/24 && arp.tpa == 192.168.5.1 && arp.op == 1, priority 110, uuid d447a962 reg9[2] = lookup_arp(inport, arp.spa, arp.sha); /* MAC binding to ff:ff:ff:ff:ff:ff found. */ reg9[3] = 1; next; 2. lr_in_learn_neighbor (northd.c:13078): reg9[2] == 1 || reg9[3] == 0, priority 100, uuid 6aad6f8d mac_cache_use; next; 3. lr_in_ip_input (northd.c:12440): inport == "lrp-8c020a" && arp.op == 1 && arp.tpa == 192.168.5.1 && arp.spa == 192.168.5.0/24 && is_chassis_resident("cr-lrp-e18d09"), priority 90, uuid 6187e537 eth.dst = eth.src; eth.src = xreg0[0..47]; arp.op = 2; arp.tha = arp.sha; arp.sha = xreg0[0..47]; arp.tpa <-> arp.spa; outport = inport; flags.loopback = 1; output; --- Moving into OVS land on infra-prod-controller-02 where the external port is scheduled, I am able to see the ARP requests entering br-ex: However, there is no response! --- ovs-tcpdump -i br-ex -nn -e -v tcpdump: listening on mibr-ex, link-type EN10MB (Ethernet), snapshot length 262144 bytes 22:05:48.581916 90:ec:77:32:e6:6e > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 64: vlan 1106, p 0, ethertype ARP (0x0806), Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.5.1 (ff:ff:ff:ff:ff:ff) tell 192.168.5.56, length 46 --- The flow exists on br-ex (with no reply). --- # ovs-appctl dpif/dump-flows --names br-ex recirc_id(0),in_port(bond0),ct_state(-new-est-rel-rpl-inv-trk),ct_mark(0/0x1),eth(src=90:ec:77:32:e6:6e,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=1106,pcp=0),encap(eth_type(0x0806),arp(sip=192.168.5.56,tip=192.168.5.1,op=1/0xff,sha=90:ec:77:32:e6:6e)), packets:2425, bytes:155200, used:0.633s, actions:mibond0,br-ex,pop_vlan,tapfcc39c5e-d0 --- I do NOT see the flow in `br-int`. --- (openvswitch-vswitchd)[root@infra-prod-controller-02 /]# ovs-appctl dpif/dump-flows --names br-int recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.21,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=1e:4d:dd:80:c3:ed),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:6468, bytes:426888, used:0.068s, actions:userspace(pid=4294967295,slow_path(bfd)) recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.13,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=9a:b2:a9:fd:a6:5b),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:6456, bytes:426096, used:0.860s, actions:userspace(pid=4294967295,slow_path(bfd)) recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.22,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=66:90:cc:1e:a5:b6),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:6452, bytes:425832, used:0.716s, actions:userspace(pid=4294967295,slow_path(bfd)) recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.12,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=96:54:aa:d3:29:b9),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:6453, bytes:425898, used:0.729s, actions:userspace(pid=4294967295,slow_path(bfd)) recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.15,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=8e:0b:17:26:df:ef),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:6469, bytes:426954, used:0.609s, actions:userspace(pid=4294967295,slow_path(bfd)) recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.19,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=5a:56:62:47:6f:21),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:6457, bytes:426162, used:0.265s, actions:userspace(pid=4294967295,slow_path(bfd)) recirc_id(0),tunnel(tun_id=0x0,src=10.27.12.11,dst=10.27.12.24,flags(-df+csum+key)),in_port(genev_sys_6081),eth(src=ae:36:b4:02:1c:67),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), packets:6462, bytes:426492, used:0.253s, actions:userspace(pid=4294967295,slow_path(bfd)) --- I confirmed that tapfcc39c5e-d0 is indeed in the `br-int` Bridge. It is not marked internal for some reason, but unclear if that is by design. --- # ovs-vsctl show 678a6be7-51d4-44d4-9366-67311227cdbb Bridge br-int fail_mode: secure datapath_type: system … Port tapfcc39c5e-d0 Interface tapfcc39c5e-d0 --- When I issue the ofproto/trace commands, it appears that the OVS IS generating response, but it is definitely not being sent back out of the system: --- Trace ARP Request entering br-ex # ovs-appctl ofproto/trace --names br-ex in_port=1,dl_vlan=1106,dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0806,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e,arp_tha=ff:ff:ff:ff:ff:ff … Final flow: unchanged Megaflow: recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_mark=0/0x1,eth,arp,in_port=bond0,dl_vlan=1106,dl_vlan_pcp=0,dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e Datapath actions: mibond0,br-ex,pop_vlan,tapfcc39c5e-d0 --- --- Trace ARP Request entering br-int # ovs-appctl ofproto/trace --names br-int in_port=tapfcc39c5e-d0,dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,dl_type=0x0806,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e,arp_tha=ff:ff:ff:ff:ff:ff … Final flow: arp,reg0=0x300,reg10=0x401,reg11=0x2d,reg12=0x2c,reg13=0x28,reg14=0x2,reg15=0x2,metadata=0x16,in_port="tapfcc39c5e-d0",vlan_tci=0x0000,dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_spa=192.168.5.1,arp_tpa=192.168.5.56,arp_op=2,arp_sha=fa:16:3e:37:43:b3,arp_tha=90:ec:77:32:e6:6e Megaflow: recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_mark=0/0x1,eth,arp,in_port="tapfcc39c5e-d0",dl_src=90:ec:77:32:e6:6e,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.5.56,arp_tpa=192.168.5.1,arp_op=1,arp_sha=90:ec:77:32:e6:6e,arp_tha=ff:ff:ff:ff:ff:ff Datapath actions: set(eth(src=fa:16:3e:37:43:b3,dst=90:ec:77:32:e6:6e)),set(arp(sip=192.168.5.1,tip=192.168.5.56,op=2,sha=fa:16:3e:37:43:b3,tha=90:ec:77:32:e6:6e)),tapfcc39c5e-d0 This flow is handled by the userspace slow path because it: - Uses action(s) not supported by datapath. --- At this point, I would expect to see these responses actually hitting tapfcc39c5e-d0, but they do not seem to be present. It’s not clear if the above “Uses action(s) not supported by datapath” is an issue yet. --- # ovs-tcpdump -i tapfcc39c5e-d0 -nn -v -e tcpdump: listening on ovsmi100537, link-type EN10MB (Ethernet), snapshot length 262144 bytes 22:25:06.677218 90:ec:77:32:e6:6e > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.5.1 (ff:ff:ff:ff:ff:ff) tell 192.168.5.56, length 46 --- If the packets WERE to hit tapfcc39c5e-d0, it appears that the system would send the response out bond0 as expected: --- # ovs-appctl ofproto/trace --names br-int in_port=tapfcc39c5e-d0,dl_type=0x0806,dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_spa=192.168.5.1,arp_tpa=192.168.5.56,arp_op=2,arp_sha=fa:16:3e:37:43:b3,arp_tha=90:ec:77:32:e6:6e … Final flow: arp,reg0=0x300,reg10=0x400,reg11=0x2d,reg12=0x2c,reg13=0x1b,reg14=0x2,reg15=0x8001,metadata=0x16,in_port="tapfcc39c5e-d0",vlan_tci=0x0000,dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_spa=192.168.5.1,arp_tpa=192.168.5.56,arp_op=2,arp_sha=fa:16:3e:37:43:b3,arp_tha=90:ec:77:32:e6:6e Megaflow: recirc_id=0,ct_state=-new-est-rel-rpl-inv-trk,ct_mark=0/0x1,eth,arp,in_port="tapfcc39c5e-d0",dl_src=fa:16:3e:37:43:b3,dl_dst=90:ec:77:32:e6:6e,arp_tpa=192.168.5.56,arp_op=2 Datapath actions: push_vlan(vid=1106,pcp=0),bond0,mibond0 --- Any insight on what could be happening here or how to debug further would be GREATLY appreciated. -Austin Juniper Business Use Only
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss