Hi Numan, Thanks a lot for the feedback.
Em qui., 1 de dez. de 2022 às 14:18, Numan Siddique <num...@ovn.org> escreveu: > On Thu, Dec 1, 2022 at 10:15 AM Roberto Bartzen Acosta via discuss > <ovs-discuss@openvswitch.org> wrote: > > > > Hey folks, > > > > I would like some help to understand how DVR works in ovn/ovs. > Currently, DVR support for FIP addresses works perfectly. However, I would > like to understand if it is possible to extend this behavior to IPv6 > addresses. When I talk about IPv6 addresses, I'm referring to the GUA > addresses that are allocated to VMs (e.g., considering an openstack > deployment). > > > > I understand that there are other requirements for GUA addresses to be > routed to external networks by OVN (here comes the work of the > neutron-dynamic-routing project + FRR on hypervisors/openvswitch). For DVR > to be enabled properly, the provider networks must be stretched over the > Underlay Network and each Compute Node would have the bridge for external > traffic. In an L3 Leaf-Spine Underlay network one option for this > requirement to be reached is for the Underlay Network be able to stretch an > L2 domain(VLAN) via VXLAN as dataplane and BGP EVPN as Control Plane. In > this solution, the Leaf switches would need to work as HW VTEP Gateway to > initiate and terminate the VXLANs tunnels and use BGP EVPN to learn and > advertise the MAC Addresses from the Compute Node’s provider network. > > > > The common reference architecture is detailed in [picture 1] and the > design for the DVR+FRR solution (IPv4/IPv6) is detailed in [picture 2]. > > > > What's the problem? well, inbound traffic to a GUA address goes through > the chassis where the router's external port resides. > > > > > > Looking at the DVR implementation for IPv4, I see that the solution is > heavily based on the idea of NAT. From initial support in [1], some fixes > in [2] and [3], and scalability improvements in [4]. > > > > > > With some debugs in the openflow pipeline, I see the entries for snat > and dnat (FIP = 200.201.0.226) with reference to is_chassis_resident (with > cr-lrp). > > > > _uuid : d37d412b-28ed-4d55-8f97-533571ffb3c1 > > actions : "ct_dnat_in_czone(192.168.0.120);" > > controller_meter : [] > > external_ids : {source="northd.c:12693", stage-hint="5e41244f", > stage-name=lr_in_dnat} > > logical_datapath : c3ea6752-5b60-40ec-9734-dcfecbb59f68 > > logical_dp_group : [] > > match : "ip && ip4.dst == 200.201.0.226 && inport == > \"lrp-bcf42b8c-5ca5-44d1-8065-6f36892a1473\" && > is_chassis_resident(\"cr-lrp-bcf42b8c-5ca5-44d1-8065-6f36892a1473\")" > > pipeline : ingress > > priority : 100 > > table_id : 6 > > tags : {} > > hash : 0 > > > > > > Does distributed router support for IPv6 need some kind of NAT rule to > work? And about NAT, I saw that there is a test case on system-ovn.at [5] > that validates NAT operation for a distributed router with IPv6 N/S > traffic, is that right? maybe for local addresses it might make some sense, > but for N/S traffic - GUA address (does this test make sense?). > > In the case of DVR for FIPS, when a NAT entry is added for the FIP in > the logical router, CMS has to set external_mac and logical_port > fields. > When these are set, ovn-controller claiming the VM of the FIP will > reply for the ARP requests for the FIP. The same for IPv6 FIPs. Also > the ovn-controller sends periodic GARPs. > > I think for the use case you mentioned, same has to happen. i.e For > a Neighbor Solicitation request for the GUA of the VM, the reply > should be sent from the same compute node hosting the VM. > > Can you please try adding a NAT entry in the logical router like below ? > I had already tried to put a NAT rule similar to your suggestion, but my problem after including the NAT rule was related to the border leaf BGP distribution behavior. Now, with the rules added as below, I can see the traffic distributed to each compute node. (DVR IPv6 - ^wow^) # compute 1 ovn-nbctl lr-nat-add 84afed40-9bb7-4579-ad9c-2392f5398635 dnat_and_snat 2001:db8:1234::b0 2001:db8:1234::b0 28593240-2962-4f6f-b218-7b761f2afa26 fa:16:3e:c4:cc:dd # compute 2 ovn-nbctl lr-nat-add 84afed40-9bb7-4579-ad9c-2392f5398635 dnat_and_snat 2001:db8:1234::140 2001:db8:1234::140 ae528d1a-d092-4a8e-8a9a-83b849cac0a2 fa:16:3e:c4:aa:bb # tcpdump - compute 2 19:14:33.488379 a2:8f:16:4c:6e:e9 > fa:16:3e:c4:aa:bb, ethertype IPv6 (0x86dd), length 86: fe80::a08f:16ff:fe4c:6ee9 > 2001:db8:1234::140: ICMP6, neighbor solicitation, who has 2001:db8:1234::140, length 32 19:14:33.490732 fa:16:3e:c4:aa:bb > a2:8f:16:4c:6e:e9, ethertype IPv6 (0x86dd), length 86: 2001:db8:1234::140 > fe80::a08f:16ff:fe4c:6ee9: ICMP6, neighbor advertisement, tgt is 2001:db8:1234::140, length 32 root@compute2:~# tcpdump -nni br-301 ip6 -ne tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on br-301, link-type EN10MB (Ethernet), snapshot length 262144 bytes 19:24:02.303696 a2:8f:16:4c:6e:e9 > fa:16:3e:c4:aa:bb, ethertype IPv6 (0x86dd), length 118: 2001:db8:aaa::2 > 2001:db8:1234::140: ICMP6, echo request, id 63001, seq 1, length 64 19:24:02.306429 fa:16:3e:c4:aa:bb > a2:8f:16:4c:6e:e9, ethertype IPv6 (0x86dd), length 118: 2001:db8:1234::140 > 2001:db8:aaa::2: ICMP6, echo reply, id 63001, seq 1, length 64 19:24:03.305196 a2:8f:16:4c:6e:e9 > fa:16:3e:c4:aa:bb, ethertype IPv6 (0x86dd), length 118: 2001:db8:aaa::2 > 2001:db8:1234::140: ICMP6, echo request, id 63001, seq 2, length 64 19:24:03.306775 fa:16:3e:c4:aa:bb > a2:8f:16:4c:6e:e9, ethertype IPv6 (0x86dd), length 118: 2001:db8:1234::140 > 2001:db8:aaa::2: ICMP6, echo reply, id 63001, seq 2, length 64 4 packets captured 4 packets received by filter 0 packets dropped by kernel # tcmdump - compute 1 19:17:20.816366 a2:8f:16:4c:6e:e9 > 33:33:ff:00:00:b0, ethertype IPv6 (0x86dd), length 86: fe80::a08f:16ff:fe4c:6ee9 > ff02::1:ff00:b0: ICMP6, neighbor solicitation, who has 2001:db8:1234::b0, length 32 19:17:20.818420 fa:16:3e:c4:cc:dd > a2:8f:16:4c:6e:e9, ethertype IPv6 (0x86dd), length 86: 2001:db8:1234::b0 > fe80::a08f:16ff:fe4c:6ee9: ICMP6, neighbor advertisement, tgt is 2001:db8:1234::b0, length 32 root@compute1:~# tcpdump -nni br-301 ip6 -ne tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on br-301, link-type EN10MB (Ethernet), snapshot length 262144 bytes 19:23:32.623132 a2:8f:16:4c:6e:e9 > fa:16:3e:c4:cc:dd, ethertype IPv6 (0x86dd), length 118: 2001:db8:aaa::2 > 2001:db8:1234::b0: ICMP6, echo request, id 37989, seq 1, length 64 19:23:32.624918 fa:16:3e:c4:cc:dd > a2:8f:16:4c:6e:e9, ethertype IPv6 (0x86dd), length 118: 2001:db8:1234::b0 > 2001:db8:aaa::2: ICMP6, echo reply, id 37989, seq 1, length 64 19:23:33.624372 a2:8f:16:4c:6e:e9 > fa:16:3e:c4:cc:dd, ethertype IPv6 (0x86dd), length 118: 2001:db8:aaa::2 > 2001:db8:1234::b0: ICMP6, echo request, id 37989, seq 2, length 64 19:23:33.625346 fa:16:3e:c4:cc:dd > a2:8f:16:4c:6e:e9, ethertype IPv6 (0x86dd), length 118: 2001:db8:1234::b0 > 2001:db8:aaa::2: ICMP6, echo reply, id 37989, seq 2, length 64 4 packets captured 4 packets received by filter 0 packets dropped by kernel root@compute1:~# Apparently, the neutron-dynamic-routing BGP cannot be used in this case, as it registers the route to the router's external interface IP (this does not work with the DVR). # neutron-dynamic-routing - BGP route 2001:db8:1234::/64 nhid 82 via 2001:db8:4321:42::316 dev br-301 proto bgp metric 20 pref medium # border leaf -> New static route to the subnet pool prefix - it works! root@border:~# ip -6 route add 2001:db8:1234::/48 dev br-301 *Important point:* About the inclusion of the NAT rule for IPv6 GUA (fake NAT), any thoughts on scalability for this design? an additional rule for each VM.... Is there any resource allocation projection model about the scalability of openflow flow entries that I can use to scale the impact of this? > > # ovn-nbctl --help | grep lr-nat-ad > lr-nat-add ROUTER TYPE EXTERNAL_IP LOGICAL_IP [LOGICAL_PORT EXTERNAL_MAC] > > # ovn-nbctl lr-nat-add <logical_router> dnat_and_snat <VM GUA> <VM > GUA> <VM logical port> <SOME CHOSEN RANDOM MAC> > > I've not tried this. If you notice the EXTERNAL_IP and LOGICAL_IP are > the same (ie VM GUA). If you add an entry like this, > ovn should add logical flows to respond to IPv6 NS requests for the VM GUA. > > If this doesn't work, then we need to think of a solution in OVN. > > Thanks > Numan > > > > > However, it is clear that when the external port of the router is in > another chassis, the incoming traffic passes through the external port of > the router [tcpdump 1]. The traffic is redirected to the chassis resident > via geneve [dpcl 1] and goes through the resident chassis for the external > destination GUA directly via vxlan[tcpdump 2] without going back through > geneve to the router. > > > > > > Any ideas about how to program the ovn so that the chassis with the GUA > address can receive and send traffic directly? (without redirection to the > centralized router). > > > > Regards, > > Roberto > > > > [picture 1] - > https://drive.google.com/file/d/1oaGmKbFGHqMwBxKVxsT4I-7rFt2pXSJW/view?usp=sharing > > [picture 2] - > https://drive.google.com/file/d/1E-MRe9WJubPz5ZP836bNPFPRMuPCt4_s/view?usp=sharing > > > > > > [1] > https://github.com/ovn-org/ovn/commit/ceacd9d49316d16b9273151bc1ecae9a2b2beeb8 > > [2] > https://github.com/ovn-org/ovn/commit/551e3d989557bd2249d5bbe0978b44b775c5e619 > > [3] > https://github.com/ovn-org/ovn/commit/8244c6b6bd8802a018e4ec3d3665510ebb16a9c7 > > [4] > https://github.com/ovn-org/ovn/commit/2dc7869436de32205f60128172196b3a207ab265 > > [5] https://github.com/ovn-org/ovn/blob/main/tests/system-ovn.at#L3628 > > > > > > [tcpdump 1] - router external port chassis > > > > root@compute1:~# tcpdump -nni br-301 not port 22 -ne > > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > > listening on br-301, link-type EN10MB (Ethernet), snapshot length 262144 > bytes > > 14:54:23.442467 fa:16:3e:7b:d4:cf > a2:8f:16:4c:6e:e9, ethertype IPv4 > (0x0800), length 74: 200.201.0.110.57524 > 10.0.3.1.53: Flags [S], seq > 2063839790, win 64492, options [mss 1402,sackOK,TS val 3936281288 ecr > 0,nop,wscale 6], length 0 > > 14:54:24.142076 a2:8f:16:4c:6e:e9 > fa:16:3e:9d:67:1f, ethertype IPv6 > (0x86dd), length 118: 2001:db8:aaa::2 > 2001:db8:1234::140: ICMP6, echo > request, id 9014, seq 5, length 64 > > 14:54:24.473037 fa:16:3e:7b:d4:cf > a2:8f:16:4c:6e:e9, ethertype IPv4 > (0x0800), length 74: 200.201.0.110.57524 > 10.0.3.1.53: Flags [S], seq > 2063839790, win 64492, options [mss 1402,sackOK,TS val 3936282318 ecr > 0,nop,wscale 6], length 0 > > 14:54:25.143193 a2:8f:16:4c:6e:e9 > fa:16:3e:9d:67:1f, ethertype IPv6 > (0x86dd), length 118: 2001:db8:aaa::2 > 2001:db8:1234::140: ICMP6, echo > request, id 9014, seq 6, length 64 > > 14:54:25.166861 a2:8f:16:4c:6e:e9 > fa:16:3e:9d:67:1f, ethertype IPv6 > (0x86dd), length 86: fe80::a08f:16ff:fe4c:6ee9 > 2001:db8:4321:42::316: > ICMP6, neighbor solicitation, who has 2001:db8:4321:42::316, length 32 > > 14:54:25.168384 fa:16:3e:9d:67:1f > a2:8f:16:4c:6e:e9, ethertype IPv6 > (0x86dd), length 86: 2001:db8:4321:42::316 > fe80::a08f:16ff:fe4c:6ee9: > ICMP6, neighbor advertisement, tgt is 2001:db8:4321:42::316, length 32 > > 14:54:26.145280 a2:8f:16:4c:6e:e9 > fa:16:3e:9d:67:1f, ethertype IPv6 > (0x86dd), length 118: 2001:db8:aaa::2 > 2001:db8:1234::140: ICMP6, echo > request, id 9014, seq 7, length 64 > > 14:54:26.489182 fa:16:3e:7b:d4:cf > a2:8f:16:4c:6e:e9, ethertype IPv4 > (0x0800), length 74: 200.201.0.110.57524 > 10.0.3.1.53: Flags [S], seq > 2063839790, win 64492, options [mss 1402,sackOK,TS val 3936284334 ecr > 0,nop,wscale 6], length 0 > > 14:54:27.146833 a2:8f:16:4c:6e:e9 > fa:16:3e:9d:67:1f, ethertype IPv6 > (0x86dd), length 118: 2001:db8:aaa::2 > 2001:db8:1234::140: ICMP6, echo > request, id 9014, seq 8, length 64 > > ^C > > 9 packets captured > > 9 packets received by filter > > 0 packets dropped by kernel > > > > > > > > [dpcl 1] - router external port chassis > > > > root@compute1:~# ovs-dpctl dump-flows > > > recirc_id(0),tunnel(tun_id=0x0,src=192.168.200.30,dst=192.168.200.10,flags(-df+csum+key)),in_port(1),eth(),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), > packets:183319, bytes:12099054, used:0.228s, > actions:userspace(pid=4294967295,slow_path(bfd)) > > > recirc_id(0),in_port(5),eth(src=fa:16:3e:cc:79:0a,dst=fa:16:3e:2d:98:92),eth_type(0x0800),ipv4(src=192.168.0.199,dst= > 0.0.0.0/128.0.0.0,proto=6,frag=no), packets:50, bytes:3700, used:2.976s, > flags:S, actions:ct(zone=7),recirc(0x4dc) > > > recirc_id(0x4fa),in_port(5),eth(src=fa:16:3e:7b:d4:cf),eth_type(0x0800),ipv4(src=192.168.0.199,frag=no), > packets:64306, bytes:4759173, used:2.976s, flags:S, > actions:ct(commit,zone=1,nat(src=200.201.0.110)),recirc(0x4e2) > > > recirc_id(0x629),in_port(4),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=a2:8f:16:4c:6e:e9,dst=fa:16:3e:9d:67:1f),eth_type(0x86dd),ipv6(src=2000::/ffc0::,dst=2001:db8:1234::140,proto=58,tclass=0/0x3,hlimit=63,frag=no), > packets:19, bytes:2242, used:0.284s, > actions:ct_clear,set(tunnel(tun_id=0x2,dst=192.168.200.11,ttl=64,tp_dst=6081,geneve({class=0x102,type=0x80,len=4,0x60008}),flags(df|csum|key))),set(eth(src=fa:16:3e:26:0b:f7,dst=fa:16:3e:c4:15:40)),set(ipv6(hlimit=62)),1 > > > recirc_id(0x4e2),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=fa:16:3e:7b:d4:cf,dst=a2:8f:16:4c:6e:e9),eth_type(0x0800),ipv4(dst= > 0.0.0.0/128.0.0.0,frag=no), packets:64306, bytes:4759173, used:2.976s, > flags:S, actions:ct_clear,4 > > > recirc_id(0x4dc),in_port(5),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(src=fa:16:3e:cc:79:0a,dst=fa:16:3e:2d:98:92),eth_type(0x0800),ipv4(src=192.168.0.199,dst= > 8.0.0.0/248.0.0.0,proto=6,ttl=64,frag=no), packets:50, bytes:3700, > used:2.976s, flags:S, > actions:ct(commit,zone=7,label=0/0x1,nat(src)),set(eth(src=fa:16:3e:7b:d4:cf,dst=a2:8f:16:4c:6e:e9)),set(ipv4(ttl=63)),ct(zone=1,nat),recirc(0x4fa) > > > recirc_id(0),tunnel(tun_id=0x0,src=192.168.200.11,dst=192.168.200.10,flags(-df+csum+key)),in_port(1),eth(),eth_type(0x0800),ipv4(proto=17,frag=no),udp(dst=3784), > packets:183257, bytes:12094962, used:0.404s, > actions:userspace(pid=4294967295,slow_path(bfd)) > > > recirc_id(0),in_port(4),eth(src=a2:8f:16:4c:6e:e9,dst=fa:16:3e:9d:67:1f),eth_type(0x86dd),ipv6(src=2001:db8:aaa::/ffff:ffff:ffff:ffff::,dst=2001:db8:1234::140,proto=58,hlimit=63,frag=no), > packets:19, bytes:2242, used:0.284s, actions:ct(zone=1,nat),recirc(0x628) > > > recirc_id(0x628),in_port(4),eth(),eth_type(0x86dd),ipv6(dst=2001:db8:1234::140,frag=no), > packets:19, bytes:2242, used:0.284s, > actions:ct(commit,zone=1,nat(dst=2001:db8:1234::140)),recirc(0x629) > > > > > > [tcpdump 2] - GUA address chassis > > > > root@compute2:~# tcpdump -nni br-301 not port 22 -ne > > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > > listening on br-301, link-type EN10MB (Ethernet), snapshot length 262144 > bytes > > 14:54:30.146532 fa:16:3e:0f:d9:1a > a2:8f:16:4c:6e:e9, ethertype IPv6 > (0x86dd), length 118: 2001:db8:1234::140 > 2001:db8:aaa::2: ICMP6, echo > reply, id 9014, seq 11, length 64 > > 14:54:31.149147 fa:16:3e:0f:d9:1a > a2:8f:16:4c:6e:e9, ethertype IPv6 > (0x86dd), length 118: 2001:db8:1234::140 > 2001:db8:aaa::2: ICMP6, echo > reply, id 9014, seq 12, length 64 > > ^C > > 2 packets captured > > 2 packets received by filter > > 0 packets dropped by kernel > > > > > > > > > > > > ‘Esta mensagem é direcionada apenas para os endereços constantes no > cabeçalho inicial. Se você não está listado nos endereços constantes no > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão > imediatamente anuladas e proibidas’. > > > > ‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para > assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não > poderá aceitar a responsabilidade por quaisquer perdas ou danos causados > por esse e-mail ou por seus anexos’. > > > > _______________________________________________ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > -- _‘Esta mensagem é direcionada apenas para os endereços constantes no cabeçalho inicial. Se você não está listado nos endereços constantes no cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão imediatamente anuladas e proibidas’._ * **‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss