Hi Ilya Maximets, I did some further testing on my end. Just to make sure it's an address family issue, I tried to configure all VXLAN interfaces with IPv6, but I ran into an issue with the source IP address selection.
I specified the `local_ip` of the tunnel as `2403:400:31da:ffff::18:6`, which is also added on the bridge interface, but it picks the link-local address of `fe80::dc03:37ff:fee2:1fef` instead. This gets dropped by other devices along the way. ``` $ ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev | grep 192.168.1.33 [abbreviated] ufid:7f6b377d-8ee1-4605-91a7-34b1076068f2, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(dpdk-vm101),packet_type(ns=0,id=0),eth(src=52:54:00:3d:cd:0c/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(0x0806),arp(sip=192.168.1.34/0.0.0.0,tip=192.168.1.33/0.0.0.0,op=1/0,sha=52:54:00:3d:cd:0c/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), packets:1, bytes:42, used:0.063s, dp:ovs, actions:tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4,eth(dst=90:0a:84:9e:95:70,src=de:03:37:e2:1f:ef,dl_type=0x86dd),ipv6(src=fe80::dc03:37ff:fee2:1fef,dst=2403:400:31da:ffff::18:3,label=0,proto=17,tclass=0x0,hlimit=64),udp(src=0,dst=4789,csum=0xffff),vxlan(flags=0x8000000,vni=0x1)),out_port(br-phy)),push_vlan(vid=304,pcp=0),exit_p0, dp-extra-info:miniflow_bits(4,0) ``` ``` $ ovs-vsctl show Bridge br-int datapath_type: netdev Port dpdk-vm101 Interface dpdk-vm101 type: dpdkvhostuserclient options: {vhost-server-path="/var/run/vhost-sockets/dpdk-vm101"} Port vxlan0 Interface vxlan0 type: vxlan options: {dst_port="4789", key="1", local_ip="2403:400:31da:ffff::18:6", remote_ip="2403:400:31da:ffff::18:3"} ``` The bridge interface has the `local_ip` addresses. ``` $ ip a [abbreviated] 27: br-phy: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether de:03:37:e2:1f:ef brd ff:ff:ff:ff:ff:ff inet 100.87.18.6/32 scope global br-phy valid_lft forever preferred_lft forever inet6 2403:400:31da:ffff::18:6/128 scope global valid_lft forever preferred_lft forever inet6 fe80::dc03:37ff:fee2:1fef/64 scope link valid_lft forever preferred_lft forever ``` The kernel routing table show has a `src` of `2403:400:31da:ffff::18:6` ``` $ ip -6 route [abbreviated] ::1 dev lo proto kernel metric 256 pref medium 2403:400:31da:ffff::18:3 nhid 93 via fe80::920a:84ff:fe9e:9570 dev br-phy proto bgp src 2403:400:31da:ffff::18:6 metric 20 pref medium 2403:400:31da:ffff::18:6 dev br-phy proto kernel metric 256 pref medium 2403:400:31da::/48 nhid 93 via fe80::920a:84ff:fe9e:9570 dev br-phy proto bgp src 2403:400:31da:ffff::18:6 metric 20 pref medium ``` However, in OVS's route table, the link-local address of `fe80::dc03:37ff:fee2:1fef` is picked as the the SRC instead. ``` $ ovs-appctl ovs/route/show Route Table: [abbreviated] Cached: 2403:400:31da:ffff::18:3/128 dev br-phy GW fe80::920a:84ff:fe9e:9570 SRC fe80::dc03:37ff:fee2:1fef Cached: 2403:400:31da:ffff::18:6/128 dev br-phy SRC 2403:400:31da:ffff::18:6 ``` I tried to rewrite the source address on bridge br-phy but this not seem to have any effect (no packet hits). My idea was to match udp packets of port 4789 and rewrite the IPv6 source address. Is my way of rewriting it correct? ``` $ ovs-ofctl add-flow br-phy "priority=50,dl_type=0x86dd,ipv6_src=fe80::dc03:37ff:fee2:1fef,nw_proto=17,tp_dst=4789,actions=set_field:2403:400:31da:ffff::18:6->ipv6_src,normal" $ ovs-ofctl dump-flows br-phy cookie=0x0, duration=653.933s, table=0, n_packets=0, n_bytes=0, priority=50,udp6,ipv6_src=fe80::dc03:37ff:fee2:1fef,tp_dst=4789 actions=load:0x180006->NXM_NX_IPV6_SRC[0..63],load:0x2403040031daffff->NXM_NX_IPV6_SRC[64..127],NORMAL cookie=0x0, duration=275973.900s, table=0, n_packets=1638167, n_bytes=152347460, priority=0 actions=NORMAL ``` What did work was that if I add a static route for destination with the correct source address and add an ARP entry for the source. So if the source address was picked up correctly by OVS, the tunnel works correctly. ``` $ ovs-appctl ovs/route/add 2403:400:31da:ffff::18:3 br-phy 2403:400:31da:ffff::18:6 OK $ ovs-appctl tnl/arp/set br-phy 2403:400:31da:ffff::18:6 90:0a:84:9e:95:70 OK $ ovs-appctl ovs/route/show [abbreviated] Cached: 2403:400:31da:ffff::18:3/128 dev br-phy GW fe80::920a:84ff:fe9e:9570 SRC fe80::dc03:37ff:fee2:1fef Cached: 2403:400:31da:ffff::18:6/128 dev br-phy SRC 2403:400:31da:ffff::18:6 User: 2403:400:31da:ffff::18:3/128 dev br-phy GW 2403:400:31da:ffff::18:6 SRC 2403:400:31da:ffff::18:6 $ ovs-appctl tnl/arp/show IP MAC Bridge ========================================================================== fe80::920a:84ff:fe9e:9570 90:0a:84:9e:95:70 br-phy 2403:400:31da:ffff::18:6 90:0a:84:9e:95:70 br-phy $ ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev | grep 192.168.1.33 ufid:7f6b377d-8ee1-4605-91a7-34b1076068f2, skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(dpdk-vm101),packet_type(ns=0,id=0),eth(src=52:54:00:3d:cd:0c/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(0x0806),arp(sip=192.168.1.34/0.0.0.0,tip=192.168.1.33/0.0.0.0,op=1/0,sha=52:54:00:3d:cd:0c/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), packets:101, bytes:4242, used:0.455s, dp:ovs, actions:tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4,eth(dst=90:0a:84:9e:95:70,src=de:03:37:e2:1f:ef,dl_type=0x86dd),ipv6(src=2403:400:31da:ffff::18:6,dst=2403:400:31da:ffff::18:3,label=0,proto=17,tclass=0x0,hlimit=64),udp(src=0,dst=4789,csum=0xffff),vxlan(flags=0x8000000,vni=0x1)),out_port(br-phy)),push_vlan(vid=304,pcp=0),exit_p0, dp-extra-info:miniflow_bits(4,0) ``` Is it possible that the `src` address from the kernel is not imported into the cache the IP address selection is done by some other means? (simply by next-hop or first available IP address from the interface) How should I be writing the flow entry on br-phy to rewrite the source IP? I think this would be my preferred approach intead of adding static ARP/route entries since it gives me more flexibility. Regards, Derrick From: discuss <ovs-discuss-boun...@openvswitch.org> on behalf of Lim, Derrick via discuss <ovs-discuss@openvswitch.org> Date: Monday, January 29, 2024 at 11:19 To: Ilya Maximets <i.maxim...@ovn.org>, ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org> Cc: i.maxim...@ovn.org <i.maxim...@ovn.org> Subject: Re: [ovs-discuss] Encapsulate VXLAN and then process other flows [EXTERNAL] This message comes from an external organization. Hi Ilya Maximets, Thank you for looking into it. I'll try to take a stab at making the change. Could you please point to where I should look? Thank you, Derrick From: Ilya Maximets <i.maxim...@ovn.org> Date: Friday, January 26, 2024 at 20:13 To: Lim, Derrick | Derrick | CMD <derrick....@rakuten.com>, ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org> Cc: i.maxim...@ovn.org <i.maxim...@ovn.org> Subject: Re: [ovs-discuss] Encapsulate VXLAN and then process other flows [EXTERNAL] This message comes from an external organization. On 1/26/24 09:37, Lim, Derrick wrote: > Hi Ilya Maximets, > > Thank you for explanation. I've create the two bridges, but the packet > seems to be dropped looking at the `ovs-appctl dpctl/dump-flows` output. > I didn’t receive it on the remote host either. > > In my setup, the two physical hosts are separated by a L3 network > (local=100.87.18.6/32, remote=100.87.18.3/32). The routes are learnt by > a routing agent and exported to the kernel. The kernel has the correct > routes, but this information does not seem to be synced to OVS. > > $ ip route get 100.87.18.3 > 100.87.18.3 via inet6 fe80::920a:84ff:fe9e:9570 dev br-phy src 100.87.18.6 > uid 0 > cache > > $ ovs-appctl ovs/route/lookup 100.87.18.3 > src 100.87.2.168 > gateway 100.87.2.129 > dev ens3f1v1 > > Perhaps the problem is that because I'm using BGP unnumbered, so the IPv4 > destination has an IPv6 next-hop. I tried adding the route statically but > it seems not to be accepted. > > $ ovs-appctl ovs/route/add 100.87.18.3/32 br-phy fe80::920a:84ff:fe9e:9570 > Invalid pkt_mark or gateway > ovs-appctl: ovs-vswitchd: server returned an error > > I've included some additional outputs from my setup below if you find them > helpful. > > Is the routing where I'm going wrong or do you have any other advice about my > setup? Hmm. Interesting. I looked through the code and I see that OVS router module that is responsible for syncing routes from the kernel to userspace doesn't expect routes with different families. Such routes are ignored. The manual ovs/route/add command also expects the next hop to be of the same IP family, so it refuses to add an v4-via-v6 static route. So, unfortunately, the setup would work fine with the kernel datapath, since kernel does all the routing in that case, but it will not work with userspace. We need to add support for v4 via v6 routing to ovs-vswitchd in order to make the tunnels work. If you're interested in making the change, I could point you to the right places in the code. :) Otherwise, maybe someone else from the community will pick it up. Best regards, Ilya Maximets. > > $ ovs-vsctl show > [abbreviated] > Bridge br-int > datapath_type: netdev > Port vxlan0 > Interface vxlan0 > type: vxlan > options: {dst_port="4789", key="1", local_ip="100.87.18.6", > remote_ip="100.87.18.3"} > Port dpdk-vm101 > Interface dpdk-vm101 > type: dpdkvhostuserclient > options: > {vhost-server-path="/var/run/vhost-sockets/dpdk-vm101"} > > Bridge br-phy > fail_mode: standalone > datapath_type: netdev > Port br-phy > tag: 304 > Interface br-phy > type: internal > Port exit_p0 > Interface exit_p0 > type: dpdk > options: {dpdk-devargs="0000:c4:01.0"} > ovs_version: "3.1.1" > > $ ip addr show > [abbreviated] > 27: br-phy: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc fq_codel > state UP group default qlen 1000 > link/ether de:03:37:e2:1f:ef brd ff:ff:ff:ff:ff:ff > inet 100.87.18.6/32 scope global br-phy > valid_lft forever preferred_lft forever > inet6 fe80::dc03:37ff:fee2:1fef/64 scope link > valid_lft forever preferred_lft forever > > $ ovs-ofctl dump-flows br-int > cookie=0x0, duration=221.063s, table=0, n_packets=24, n_bytes=1008, > priority=50,in_port="dpdk-vm101" actions=output:vxlan0 > > $ ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev > [abbreviated] > ufid:7f6b377d-8ee1-4605-91a7-34b1076068f2, > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(dpdk-vm101),packet_type(ns=0,id=0),eth(src=52:54:00:3d:cd:0c/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(0x0806),arp(sip=192.168.1.34/0.0.0.0,tip=192.168.1.33/0.0.0.0,op=1/0,sha=52:54:00:3d:cd:0c/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), > packets:5, bytes:210, used:0.447s, dp:ovs, actions:drop, > dp-extra-info:miniflow_bits(4,0) > > Thank you, > Derrick > > *From: *Ilya Maximets <i.maxim...@ovn.org> > *Date: *Thursday, January 25, 2024 at 20:16 > *To: *Lim, Derrick | Derrick | CMD <derrick....@rakuten.com>, > ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org> > *Cc: *i.maxim...@ovn.org <i.maxim...@ovn.org> > *Subject: *Re: [ovs-discuss] Encapsulate VXLAN and then process other flows > > [EXTERNAL] This message comes from an external organization. > > On 1/25/24 10:42, Lim, Derrick via discuss wrote: >> Hey all, >> >> Is there a way I can encapsulate a packet with VXLAN, and then resubmit >> it through OVS again to run other flow actions based on this encapsulated >> packet? >> >> Currently, I have a OVS-DPDK setup where in the final step, before a packet >> leaves the host, the group action is used to pick between multiple physical >> ports, and then rewrite the mac address (mod_dl_dst) to that of the >> destination's, as well as apply the appropriate vlan tag (mod_vlan_vid). >> >> I would like the encapsulate action to take place before the step mentioned >> above. I created a tunnel port (eg. vxlan0). But if I set the action to this >> port, the packet basically leaves OVS and I can't resubmit it. >> >> In the userspace tunneling example, two bridges are used so that information >> from the kernel can be used for routing and ARP resolution. Is there a way I >> can populate these fields through various flow actions if I already know what >> they should be without going through the kernel? Or is going through the >> kernel >> absolutely required to create the data structure for encapsulation? > > The output to a tunnel is an 'output' action, i.e. the packet always leaves > the bridge. And so, it requires routing after encapsulation in order to > identify where it should go next. For routing we need IP addresses and > a routing table. This information is normally synced from the kernel. > You can add static routes via ovs-appctl ovs/route/add, but you still need > IP addresses configured on bridges. > > Normally the problem of applying actions after encapsulation is solved by > having a tunnel interface in one bridge (br-int) and the egress interfaces > in the other bridge (br-phy). The br-phy should have an IP address from the > tunnel subnet, so after encapsulation the packet is getting routed to br-phy. > In br-phy the packet can be matched with OF rules and actions can be executed > before sending it to the egress interface. > > Best regards, Ilya Maximets. >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss