Hi: Data plane restores when cleaning up flows using ovs-dpctl del-flows and eventually all the flows catch up as flows added by ovn are intact. However, not sure what flow caused this as the issue pops up on ovs-vswitchd restarts and needs to be workaround by dpctl del-flows. Not sure if it's due to version compatibility with 2.11 ovn and 2.16 ovs or any particular patch in ovs/ovn that already has this fix . Will keep looking in parallel as the workaround unblocks this for now. Any additional pointers would be good too vs this workaround.
Regards, Aliasgar On Fri, Apr 19, 2024 at 4:24 PM aginwala <aginw...@asu.edu> wrote: > Hi All: > > Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs > connectivity is lost when setting chassis for provider network lrp to this > new gateway. For interconnection gateways and hypervisors its not an issue/ > lrp > _uuid : 387a735d-fc11-4e90-8655-07785aa024af > chassis : b80a285b-586a-42d9-b189-69d641f143b1 > datapath : d9219b69-5961-4f24-8414-1d4054b23169 > external_ids : {} > gateway_chassis : [728adc6d-3236-4637-86e3-0f6745cf1b50, > 7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf, > d1b42374-c475-4745-abdb-36e72140c5b5] > logical_port : "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e" > mac : ["74:db:d1:80:d3:af 10.169.247.140/24"] > nat_addresses : [] > options : > {distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"} > parent_port : [] > tag : [] > tunnel_key : 2 > type : chassisredirect > > provider network > port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90 > type: localnet > tag: 20 > addresses: ["unknown"] > ## encap ip for ovn is on eth0 > > ## gw interfaces brens2f0 hosts uplink provider network > ovs-vsctl list-br > br-int > brens2f0 > ovs-vsctl list-ports brens2f0 > ens2f0 > patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int > ## fail mode secure > ovs-vsctl get-fail-mode br-int > secure > ## set chassis > ovn-nbctl lrp-set-gateway-chassis lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e > cee81be9-f782-4c82-800e-c5c5327531e4 101 > > ovn-controller is running as a container on the new gateway > ovn-controller --version > ovn-controller (Open vSwitch) 2.11.1-13 > OpenFlow versions 0x4:0x4 > > ## ovs on the host 5.4 kernel > ovs-vsctl --version > ovs-vsctl (Open vSwitch) 2.16.0 > DB Schema 8.3.0 > > ovs-ofctl --version > ovs-ofctl (Open vSwitch) 2.16.0 > OpenFlow versions 0x1:0x6 > > > Digging further with tcpdump on the destination vm interface shows vlan > being present causing connectivity failure and no reply packet > 20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q > (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id > 53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 > > 10.78.8.42: ICMP echo request, id 7765, seq 791, length 64 > 20:26:07.375960 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q > (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id > 36269, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 > > 10.78.8.42: ICMP echo request, id 7765, seq 792, length 64 > > openflow rules for atrip vlan 20 is correct that are programmed with ovn > on new/old gw : > ovs-ofctl dump-flows br-int | grep strip_vlan | grep 20 > cookie=0x0, duration=27.894s, table=65, n_packets=136, n_bytes=19198, > idle_age=0, priority=100,reg15=0x1,metadata=0x1 > actions=mod_vlan_vid:20,output:161,strip_vlan > cookie=0x0, duration=30.055s, table=0, n_packets=1592, n_bytes=130783, > idle_age=0, priority=150,in_port=161,dl_vlan=20 > actions=strip_vlan,load:0xe1->NXM_NX_REG13[],load:0x36->NXM_NX_REG11[],load:0xd7->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8) > > > Checking ovs datapath flow shows vlan being present > ovs-dpctl dump-flows | grep vlan > recirc_id(0x422),tunnel(tun_id=0x10066000005,src=10.172.66.144,dst=10.173.84.83,flags(-df+csum+key)),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(dst=74:db:d1:80:0a:15),eth_type(0x8100),vlan(vid=20/0x14),encap(eth_type(0x0800),ipv4(frag=no)), > packets:1713, bytes:174726, used:0.145s, actions:5 > > Couldn't find much drift with ofproto/trace > ovs-appctl ofproto/trace br-int in_port=2321,dl_vlan=20 > running on old/new gw (replace with in_port) > > > Tried stripping on the hypervisor/compute and data plane is ok but thats > not the right approach > ovs-ofctl add-flow br-int "priority=65535,dl_vlan=20 > actions=strip_vlan,output:4597" > > Downgrading the kernel to 4.15 and pinning to ovs 2.11 restores the data > plane with no vlan and 802.1q in the tcpdump on the destion workload tap > interface. > > > Is it a bug or known issue with later versions; post 2.11 version of ovs > when tagged vlan is present for provider network? > > Tried to pin oflow version to 1.4 too but didn't help much as strip_vlan > flows are good. Any pointers further would be great as we continue to debug. > > > Regards, > Aliasgar > >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss