Update:

Seems in upstream 5.4 linux, it only clears vlan_present vs old 4.15 kernel
https://github.com/torvalds/linux/blob/v5.4/net/core/skbuff.c#L5408
int skb_vlan_pop(struct sk_buff *skb)
{
  u16 vlan_tci;
  __be16 vlan_proto;
  int err;

  if (likely(skb_vlan_tag_present(skb))) {
    __vlan_hwaccel_clear_tag(skb);
  } else {
...

static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb)
{
  skb->vlan_present = 0;             //////// only clears 'present' flag
}


Hence, we patched stt on branch 2.16 ovs t
## update __push_stt_header on ovs 2.16
diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
index 39a294764..ad1f0aa39 100644
--- a/datapath/linux/compat/stt.c
+++ b/datapath/linux/compat/stt.c
@@ -622,7 +622,9 @@ static int __push_stt_header(struct sk_buff *skb,
__be64 tun_id,
                stth->flags |= STT_CSUM_VERIFIED;
        }

-       stth->vlan_tci = htons(skb->vlan_tci);
+       if (skb_vlan_tag_present(skb)) {
+        stth->vlan_tci = htons(skb->vlan_tci);
+    }
        skb->vlan_tci = 0;
        put_unaligned(tun_id, &stth->key);

Looks like part of linux change, stt side it was either not called out or
missed. Hence, let us know for any further amendments on above changes if
any as issue is mitigated with this patch and workaround is needed no more.
We will do some more tests and call out for any other failures.


Regards,
Aliasgar

On Tue, Apr 23, 2024 at 10:35 AM aginwala <aginw...@asu.edu> wrote:

> Hi:
>
> Data plane restores when cleaning up flows using ovs-dpctl del-flows and
> eventually all the flows catch up as flows added by ovn are intact.
> However, not sure what flow caused this as the issue pops up on
> ovs-vswitchd restarts and needs to be  workaround by dpctl del-flows. Not
> sure if it's due to version compatibility with 2.11 ovn and 2.16 ovs or any
> particular patch in ovs/ovn that already has this fix . Will keep looking
> in parallel as the workaround unblocks this for now. Any additional
> pointers would be good too vs this workaround.
>
> Regards,
> Aliasgar
>
>
> On Fri, Apr 19, 2024 at 4:24 PM aginwala <aginw...@asu.edu> wrote:
>
>> Hi All:
>>
>> Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs
>> connectivity is lost when setting chassis for provider network lrp to this
>> new gateway. For interconnection gateways and hypervisors its not an issue/
>> lrp
>> _uuid               : 387a735d-fc11-4e90-8655-07785aa024af
>> chassis             : b80a285b-586a-42d9-b189-69d641f143b1
>> datapath            : d9219b69-5961-4f24-8414-1d4054b23169
>> external_ids        : {}
>> gateway_chassis     : [728adc6d-3236-4637-86e3-0f6745cf1b50,
>> 7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf,
>> d1b42374-c475-4745-abdb-36e72140c5b5]
>> logical_port        : "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"
>> mac                 : ["74:db:d1:80:d3:af 10.169.247.140/24"]
>> nat_addresses       : []
>> options             :
>> {distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"}
>> parent_port         : []
>> tag                 : []
>> tunnel_key          : 2
>> type                : chassisredirect
>>
>> provider network
>> port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90
>>         type: localnet
>>         tag: 20
>>         addresses: ["unknown"]
>> ## encap ip for ovn is on eth0
>>
>> ## gw interfaces brens2f0 hosts uplink provider network
>> ovs-vsctl list-br
>> br-int
>> brens2f0
>> ovs-vsctl list-ports brens2f0
>> ens2f0
>> patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int
>> ## fail mode secure
>> ovs-vsctl get-fail-mode br-int
>> secure
>> ## set chassis
>> ovn-nbctl lrp-set-gateway-chassis
>> lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e
>> cee81be9-f782-4c82-800e-c5c5327531e4 101
>>
>> ovn-controller is running as a container on the new gateway
>> ovn-controller --version
>> ovn-controller (Open vSwitch) 2.11.1-13
>> OpenFlow versions 0x4:0x4
>>
>> ## ovs on the host 5.4 kernel
>> ovs-vsctl --version
>> ovs-vsctl (Open vSwitch) 2.16.0
>> DB Schema 8.3.0
>>
>> ovs-ofctl --version
>> ovs-ofctl (Open vSwitch) 2.16.0
>> OpenFlow versions 0x1:0x6
>>
>>
>> Digging further with tcpdump on the destination vm interface shows vlan
>> being present causing connectivity failure and no reply packet
>> 20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
>> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
>> 53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
>> 10.78.8.42: ICMP echo request, id 7765, seq 791, length 64
>> 20:26:07.375960 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
>> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
>> 36269, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
>> 10.78.8.42: ICMP echo request, id 7765, seq 792, length 64
>>
>> openflow rules for atrip vlan 20 is correct that are programmed with ovn
>> on new/old gw :
>> ovs-ofctl dump-flows br-int | grep strip_vlan | grep 20
>> cookie=0x0, duration=27.894s, table=65, n_packets=136, n_bytes=19198,
>> idle_age=0, priority=100,reg15=0x1,metadata=0x1
>> actions=mod_vlan_vid:20,output:161,strip_vlan
>> cookie=0x0, duration=30.055s, table=0, n_packets=1592, n_bytes=130783,
>> idle_age=0, priority=150,in_port=161,dl_vlan=20
>> actions=strip_vlan,load:0xe1->NXM_NX_REG13[],load:0x36->NXM_NX_REG11[],load:0xd7->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)
>>
>>
>> Checking ovs datapath flow shows vlan being present
>> ovs-dpctl dump-flows  | grep vlan
>> recirc_id(0x422),tunnel(tun_id=0x10066000005,src=10.172.66.144,dst=10.173.84.83,flags(-df+csum+key)),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(dst=74:db:d1:80:0a:15),eth_type(0x8100),vlan(vid=20/0x14),encap(eth_type(0x0800),ipv4(frag=no)),
>> packets:1713, bytes:174726, used:0.145s, actions:5
>>
>> Couldn't find much drift with ofproto/trace
>> ovs-appctl ofproto/trace br-int in_port=2321,dl_vlan=20
>> running on old/new gw (replace with in_port)
>>
>>
>> Tried stripping on the hypervisor/compute and data plane is ok but thats
>> not the right approach
>> ovs-ofctl add-flow br-int "priority=65535,dl_vlan=20
>> actions=strip_vlan,output:4597"
>>
>> Downgrading the kernel to 4.15 and pinning to ovs 2.11 restores the data
>> plane with no vlan and 802.1q in the tcpdump on the destion workload tap
>> interface.
>>
>>
>> Is it a bug or known issue with later versions; post 2.11 version of ovs
>> when tagged vlan is present for provider network?
>>
>> Tried to pin oflow version to 1.4 too but didn't help much as strip_vlan
>> flows are good. Any pointers further would be great as we continue to debug.
>>
>>
>> Regards,
>> Aliasgar
>>
>>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to