On 1/29/24 10:23, Lim, Derrick wrote: > Hi Ilya Maximets, > > I did some further testing on my end. Just to make sure it's an address > family issue, I tried to configure all VXLAN interfaces with IPv6, but > I ran into an issue with the source IP address selection. > > I specified the `local_ip` of the tunnel as `2403:400:31da:ffff::18:6`, > which is also added on the bridge interface, but it picks the link-local > address of `fe80::dc03:37ff:fee2:1fef` instead. This gets dropped by other > devices along the way. > > ``` > $ ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev | grep 192.168.1.33 > [abbreviated] > ufid:7f6b377d-8ee1-4605-91a7-34b1076068f2, > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(dpdk-vm101),packet_type(ns=0,id=0),eth(src=52:54:00:3d:cd:0c/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(0x0806),arp(sip=192.168.1.34/0.0.0.0,tip=192.168.1.33/0.0.0.0,op=1/0,sha=52:54:00:3d:cd:0c/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), > packets:1, bytes:42, used:0.063s, dp:ovs, > actions:tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4,eth(dst=90:0a:84:9e:95:70,src=de:03:37:e2:1f:ef,dl_type=0x86dd),ipv6(src=fe80::dc03:37ff:fee2:1fef,dst=2403:400:31da:ffff::18:3,label=0,proto=17,tclass=0x0,hlimit=64),udp(src=0,dst=4789,csum=0xffff),vxlan(flags=0x8000000,vni=0x1)),out_port(br-phy)),push_vlan(vid=304,pcp=0),exit_p0, > dp-extra-info:miniflow_bits(4,0) > ``` > > ``` > $ ovs-vsctl show > Bridge br-int > datapath_type: netdev > Port dpdk-vm101 > Interface dpdk-vm101 > type: dpdkvhostuserclient > options: > {vhost-server-path="/var/run/vhost-sockets/dpdk-vm101"} > Port vxlan0 > Interface vxlan0 > type: vxlan > options: {dst_port="4789", key="1", > local_ip="2403:400:31da:ffff::18:6", remote_ip="2403:400:31da:ffff::18:3"} > ``` > > The bridge interface has the `local_ip` addresses. > > ``` > $ ip a > [abbreviated] > 27: br-phy: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc fq_codel > state UP group default qlen 1000 > link/ether de:03:37:e2:1f:ef brd ff:ff:ff:ff:ff:ff > inet 100.87.18.6/32 scope global br-phy > valid_lft forever preferred_lft forever > inet6 2403:400:31da:ffff::18:6/128 scope global > valid_lft forever preferred_lft forever > inet6 fe80::dc03:37ff:fee2:1fef/64 scope link > valid_lft forever preferred_lft forever > ``` > > The kernel routing table show has a `src` of `2403:400:31da:ffff::18:6` > > ``` > $ ip -6 route > [abbreviated] > ::1 dev lo proto kernel metric 256 pref medium > 2403:400:31da:ffff::18:3 nhid 93 via fe80::920a:84ff:fe9e:9570 dev br-phy > proto bgp src 2403:400:31da:ffff::18:6 metric 20 pref medium > 2403:400:31da:ffff::18:6 dev br-phy proto kernel metric 256 pref medium > 2403:400:31da::/48 nhid 93 via fe80::920a:84ff:fe9e:9570 dev br-phy proto bgp > src 2403:400:31da:ffff::18:6 metric 20 pref medium > ``` > However, in OVS's route table, the link-local address of > `fe80::dc03:37ff:fee2:1fef` is picked as the the SRC instead. > > ``` > $ ovs-appctl ovs/route/show > Route Table: > [abbreviated] > Cached: 2403:400:31da:ffff::18:3/128 dev br-phy GW fe80::920a:84ff:fe9e:9570 > SRC fe80::dc03:37ff:fee2:1fef > Cached: 2403:400:31da:ffff::18:6/128 dev br-phy SRC 2403:400:31da:ffff::18:6 > ``` > > I tried to rewrite the source address on bridge br-phy but this not > seem to have any effect(no packet hits). My idea was to match udp > packets of port 4789 and rewrite the IPv6 source address. Is my way > of rewriting it correct? > > ``` > > $ ovs-ofctl add-flow br-phy > "priority=50,dl_type=0x86dd,ipv6_src=fe80::dc03:37ff:fee2:1fef,nw_proto=17,tp_dst=4789,actions=set_field:2403:400:31da:ffff::18:6->ipv6_src,normal" > > $ ovs-ofctl dump-flows br-phy > cookie=0x0, duration=653.933s, table=0, n_packets=0, n_bytes=0, > priority=50,udp6,ipv6_src=fe80::dc03:37ff:fee2:1fef,tp_dst=4789 > actions=load:0x180006->NXM_NX_IPV6_SRC[0..63],load:0x2403040031daffff->NXM_NX_IPV6_SRC[64..127],NORMAL > cookie=0x0, duration=275973.900s, table=0, n_packets=1638167, > n_bytes=152347460, priority=0 actions=NORMAL > ``` > > What did work was that if I add a static route for destination with> the > correct source address and add an ARP entry for the source. So if > the source address was picked up correctly by OVS, the tunnel works > correctly. > > ``` > $ ovs-appctl ovs/route/add 2403:400:31da:ffff::18:3 br-phy > 2403:400:31da:ffff::18:6 > OK > $ ovs-appctl tnl/arp/set br-phy 2403:400:31da:ffff::18:6 90:0a:84:9e:95:70 > OK > > $ ovs-appctl ovs/route/show > [abbreviated] > > Cached: 2403:400:31da:ffff::18:3/128 dev br-phy GW fe80::920a:84ff:fe9e:9570 > SRC fe80::dc03:37ff:fee2:1fef > Cached: 2403:400:31da:ffff::18:6/128 dev br-phy SRC 2403:400:31da:ffff::18:6 > User: 2403:400:31da:ffff::18:3/128 dev br-phy GW 2403:400:31da:ffff::18:6 SRC > 2403:400:31da:ffff::18:6 > > $ ovs-appctl tnl/arp/show > IP MAC Bridge > ========================================================================== > fe80::920a:84ff:fe9e:9570 90:0a:84:9e:95:70 br-phy > 2403:400:31da:ffff::18:6 90:0a:84:9e:95:70 br-phy > > $ ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev | grep 192.168.1.33 > ufid:7f6b377d-8ee1-4605-91a7-34b1076068f2, > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(dpdk-vm101),packet_type(ns=0,id=0),eth(src=52:54:00:3d:cd:0c/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(0x0806),arp(sip=192.168.1.34/0.0.0.0,tip=192.168.1.33/0.0.0.0,op=1/0,sha=52:54:00:3d:cd:0c/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), > packets:101, bytes:4242, used:0.455s, dp:ovs, > actions:tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4,eth(dst=90:0a:84:9e:95:70,src=de:03:37:e2:1f:ef,dl_type=0x86dd),ipv6(src=2403:400:31da:ffff::18:6,dst=2403:400:31da:ffff::18:3,label=0,proto=17,tclass=0x0,hlimit=64),udp(src=0,dst=4789,csum=0xffff),vxlan(flags=0x8000000,vni=0x1)),out_port(br-phy)),push_vlan(vid=304,pcp=0),exit_p0, > dp-extra-info:miniflow_bits(4,0) > > ``` > > Is it possible that the `src` address from the kernel is not imported> into > the cache the IP address selection is done by some other means? > (simply by next-hop or first available IP address from the interface)
What version of OVS are you using? The fueature to learn preferred source addresses from routes was added failrly recently in OVS 3.2 release by the follwoing commit: https://github.com/openvswitch/ovs/commit/49e534cd3764e853f70f01b63196f320c9a5790e If you are on OVS 3.2 or using the current master branch then I'd guess it is possible that your bgp agent is using RTA_SRC and not RTA_PREFSRC, though it's hard to tell without looking at the actual Netlink messages that kernel sends to OVS. If so, we may need to add support for RTA_SRC alongside RTA_PREFSRC. If the RTA_PERFSRC is not provided by the kernel, OVS falls back to a first IP addess from the interface: https://github.com/openvswitch/ovs/blob/96990ea1e4a597bff3750901ede7b92412ac443e/lib/ovs-router.c#L285C1-L290C6 > > How should I be writing the flow entry on br-phy to rewrite the source IP? > I think this would be my preferred approach intead of adding static ARP/route > entries since it gives me more flexibility. The rules look mostly fine. I think the main problem you have is priority. Default priority for OF rules (if not specified) is 32768, so your new rules with priority 50 are too low in a priority list and not getting hit. > > Regards, > Derrick > > > *From: *discuss <ovs-discuss-boun...@openvswitch.org> on behalf of Lim, > Derrick via discuss <ovs-discuss@openvswitch.org> > *Date: *Monday, January 29, 2024 at 11:19 > *To: *Ilya Maximets <i.maxim...@ovn.org>, ovs-discuss@openvswitch.org > <ovs-discuss@openvswitch.org> > *Cc: *i.maxim...@ovn.org <i.maxim...@ovn.org> > *Subject: *Re: [ovs-discuss] Encapsulate VXLAN and then process other flows > > *[EXTERNAL] *This message comes from an external organization. > > Hi Ilya Maximets, > > Thank you for looking into it. > I'll try to take a stab at making the change. Could you please point to where > I should look? Sure. So, the function that is actually parsing incoming route updates is here: https://github.com/openvswitch/ovs/blob/96990ea1e4a597bff3750901ede7b92412ac443e/lib/route-table.c#L196 It is parsing the new route and adds it to the routing table. See also the commit I linked above for an example of how to add new fields and write a test for the functionality. You will not need to add new fields, but you'll likley need to significantly re-organize the route_table_parse() function as it is designed to stick to a specific IP family. The appctl implementation parses the user's command here: https://github.com/openvswitch/ovs/blob/96990ea1e4a597bff3750901ede7b92412ac443e/lib/ovs-router.c#L393 And then it uses the same call to ovs_router_insert__() to actually add the route. The appctl parsing is also expecting addresses from the same famiy. Interanlly all the addresses are stored as IPv6 , i.e. actual IPv6 or IPv4-mapped IPv6. So, it should not be hard to mix and match different families, but initial parsing is a bit tricky. Let me know if some other parts need an explanation. Thanks! Best regards, Ilya Maximets. > > Thank you, > Derrick > > > *From: *Ilya Maximets <i.maxim...@ovn.org> > *Date: *Friday, January 26, 2024 at 20:13 > *To: *Lim, Derrick | Derrick | CMD <derrick....@rakuten.com>, > ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org> > *Cc: *i.maxim...@ovn.org <i.maxim...@ovn.org> > *Subject: *Re: [ovs-discuss] Encapsulate VXLAN and then process other flows > > [EXTERNAL] This message comes from an external organization. > > On 1/26/24 09:37, Lim, Derrick wrote: >> Hi Ilya Maximets, >> >> Thank you for explanation. I've create the two bridges, but the packet >> seems to be dropped looking at the `ovs-appctl dpctl/dump-flows` output. >> I didn’t receive it on the remote host either. >> >> In my setup, the two physical hosts are separated by a L3 network >> (local=100.87.18.6/32, remote=100.87.18.3/32). The routes are learnt by >> a routing agent and exported to the kernel. The kernel has the correct >> routes, but this information does not seem to be synced to OVS. >> >> $ ip route get 100.87.18.3 >> 100.87.18.3 via inet6 fe80::920a:84ff:fe9e:9570 dev br-phy src 100.87.18.6 >> uid 0 >> cache >> >> $ ovs-appctl ovs/route/lookup 100.87.18.3 >> src 100.87.2.168 >> gateway 100.87.2.129 >> dev ens3f1v1 >> >> Perhaps the problem is that because I'm using BGP unnumbered, so the IPv4 >> destination has an IPv6 next-hop. I tried adding the route statically but >> it seems not to be accepted. >> >> $ ovs-appctl ovs/route/add 100.87.18.3/32 br-phy fe80::920a:84ff:fe9e:9570 >> Invalid pkt_mark or gateway >> ovs-appctl: ovs-vswitchd: server returned an error >> >> I've included some additional outputs from my setup below if you find them >> helpful. >> >> Is the routing where I'm going wrong or do you have any other advice about >> my setup? > > Hmm. Interesting. I looked through the code and I see that OVS > router module that is responsible for syncing routes from the > kernel to userspace doesn't expect routes with different families. > Such routes are ignored. > > The manual ovs/route/add command also expects the next hop to be of > the same IP family, so it refuses to add an v4-via-v6 static route. > > So, unfortunately, the setup would work fine with the kernel datapath, > since kernel does all the routing in that case, but it will not work > with userspace. > > We need to add support for v4 via v6 routing to ovs-vswitchd in order > to make the tunnels work. > > If you're interested in making the change, I could point you to > the right places in the code. :) > Otherwise, maybe someone else from the community will pick it up. > > Best regards, Ilya Maximets. > >> >> $ ovs-vsctl show >> [abbreviated] >> Bridge br-int >> datapath_type: netdev >> Port vxlan0 >> Interface vxlan0 >> type: vxlan >> options: {dst_port="4789", key="1", local_ip="100.87.18.6", >>remote_ip="100.87.18.3"} >> Port dpdk-vm101 >> Interface dpdk-vm101 >> type: dpdkvhostuserclient >> options: >>{vhost-server-path="/var/run/vhost-sockets/dpdk-vm101"} >> >> Bridge br-phy >> fail_mode: standalone >> datapath_type: netdev >> Port br-phy >> tag: 304 >> Interface br-phy >> type: internal >> Port exit_p0 >> Interface exit_p0 >> type: dpdk >> options: {dpdk-devargs="0000:c4:01.0"} >> ovs_version: "3.1.1" >> >> $ ip addr show >> [abbreviated] >> 27: br-phy: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc >> fq_codel state UP group default qlen 1000 >> link/ether de:03:37:e2:1f:ef brd ff:ff:ff:ff:ff:ff >> inet 100.87.18.6/32 scope global br-phy >> valid_lft forever preferred_lft forever >> inet6 fe80::dc03:37ff:fee2:1fef/64 scope link >> valid_lft forever preferred_lft forever >> >> $ ovs-ofctl dump-flows br-int >> cookie=0x0, duration=221.063s, table=0, n_packets=24, n_bytes=1008, >> priority=50,in_port="dpdk-vm101" actions=output:vxlan0 >> >> $ ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev >> [abbreviated] >> ufid:7f6b377d-8ee1-4605-91a7-34b1076068f2, >> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(dpdk-vm101),packet_type(ns=0,id=0),eth(src=52:54:00:3d:cd:0c/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(0x0806),arp(sip=192.168.1.34/0.0.0.0,tip=192.168.1.33/0.0.0.0,op=1/0,sha=52:54:00:3d:cd:0c/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), >> packets:5, bytes:210, used:0.447s, dp:ovs, actions:drop, >> dp-extra-info:miniflow_bits(4,0) >> >> Thank you, >> Derrick >> >> *From: *Ilya Maximets <i.maxim...@ovn.org> >> *Date: *Thursday, January 25, 2024 at 20:16 >> *To: *Lim, Derrick | Derrick | CMD <derrick....@rakuten.com>, >> ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org> >> *Cc: *i.maxim...@ovn.org <i.maxim...@ovn.org> >> *Subject: *Re: [ovs-discuss] Encapsulate VXLAN and then process other flows >> >> [EXTERNAL] This message comes from an external organization. >> >> On 1/25/24 10:42, Lim, Derrick via discuss wrote: >>> Hey all, >>> >>> Is there a way I can encapsulate a packet with VXLAN, and then resubmit >>> it through OVS again to run other flow actions based on this encapsulated >>> packet? >>> >>> Currently, I have a OVS-DPDK setup where in the final step, before a packet >>> leaves the host, the group action is used to pick between multiple physical >>> ports, and then rewrite the mac address (mod_dl_dst) to that of the >>> destination's, as well as apply the appropriate vlan tag (mod_vlan_vid). >>> >>> I would like the encapsulate action to take place before the step mentioned >>> above. I created a tunnel port (eg. vxlan0). But if I set the action to this >>> port, the packet basically leaves OVS and I can't resubmit it. >>> >>> In the userspace tunneling example, two bridges are used so that information >>> from the kernel can be used for routing and ARP resolution. Is there a way I >>> can populate these fields through various flow actions if I already know >>> what >>> they should be without going through the kernel? Or is going through the >>> kernel >>> absolutely required to create the data structure for encapsulation? >> >> The output to a tunnel is an 'output' action, i.e. the packet always leaves >> the bridge. And so, it requires routing after encapsulation in order to >> identify where it should go next. For routing we need IP addresses and >> a routing table. This information is normally synced from the kernel. >> You can add static routes via ovs-appctl ovs/route/add, but you still need >> IP addresses configured on bridges. >> >> Normally the problem of applying actions after encapsulation is solved by >> having a tunnel interface in one bridge (br-int) and the egress interfaces >> in the other bridge (br-phy). The br-phy should have an IP address from the >> tunnel subnet, so after encapsulation the packet is getting routed to br-phy. >> In br-phy the packet can be matched with OF rules and actions can be executed >> before sending it to the egress interface. >> >> Best regards, Ilya Maximets. >> > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss