The extport chassis is not the one hosting the sriov VM.. Also only one such host has the cms option set. It's weird that each time I run the commands within 2 or 3 secs intervals, it is showing different chassis ids
On Fri, 6 Jun 2025, 20:20 Ihar Hrachyshka, <ihrac...@redhat.com> wrote: > The flipping could explain connectivity issues for metadata. AFAIR the HA > group membership is managed by Neutron itself, so if you see the group's > `ha_chassis` list flipping, the problem is probably on the neutron side. > Anything interesting in neutron logs? > > You haven't confirmed that you have other compute nodes to host the > external port. What is the a667f120-013c-4715-b049-0f53903ad9d9 chassis in > relation to the sr-iov port? Is it the same chassis or a different chassis? > Note that the external port chassis has to be a chassis that is NOT the one > hosting the sr-iov port. I see in the original post that you set the cms > options for the sr-iov hosting chassis. This is wrong, unless you have more > compute nodes with the same setting in the cluster. > > Ihar > > On Sun, Jun 1, 2025 at 2:48 PM engineer2024 <engineerlinux2...@gmail.com> > wrote: > >> also, the ovn-nb commands are showing fluctuating outputs, each time I >> run these commands. these commands are run within a gap of 1 second >> >> -------- >> # ovn-nbctl list ha_chassis_group | grep -A 4 >> 411ce494-b5aa-4d74-a544-f7dfb9c048cc >> _uuid : 411ce494-b5aa-4d74-a544-f7dfb9c048cc >> external_ids : {"neutron:availability_zone_hints"=""} >> ha_chassis : [] >> name : neutron-extport-c27d408a-a926-4509-b707-39bc43732c05 >> >> >> >> # ovn-nbctl list ha_chassis_group | grep -A 4 >> 411ce494-b5aa-4d74-a544-f7dfb9c048cc >> _uuid : 411ce494-b5aa-4d74-a544-f7dfb9c048cc >> external_ids : {"neutron:availability_zone_hints"=""} >> ha_chassis : [a667f120-013c-4715-b049-0f53903ad9d9] >> name : neutron-extport-c27d408a-a926-4509-b707-39bc43732c05 >> >> --------------- >> >> On Mon, Jun 2, 2025 at 12:04 AM engineer2024 <engineerlinux2...@gmail.com> >> wrote: >> >>> Thanks for the reply.... >>> >>> I have tried it and it worked for the dhcp ip lease for the sriov >>> external ovn port. >>> >>> But the metadata requests from the VM are not getting any replies. I >>> have pasted the br-int flows of the extport chassis. >>> >>> ------ >>> grep 'fa:16:3e:d2:18:19' flows >>> >>> cookie=0xb4c75793, duration=0.526s, table=30, n_packets=0, n_bytes=0, >>> priority=100,conj_id=1857670037,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,tp_src=68,tp_dst=67 >>> actions=controller(userdata=00.00.00.02.00.00.00.00.00.01.de.10.00.00.00.63.0a.25.34.a5.79.13.20.a9.fe.a9.fe.0a.25.34.0b.00.0a.25.34.01.00.0a.25.34.01.06.08.0a.25.e7.eb.0a.e3.64.11.33.04.00.00.a8.c0.1a.02.05.dc.01.04.ff.ff.fc.00.03.04.0a.25.34.01.36.04.0a.25.34.01,pause),resubmit(,31) >>> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >>> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,nw_dst=10.37.52.1,tp_src=68,tp_dst=67 >>> actions=conjunction(1857670037,1/2) >>> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >>> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,nw_dst=255.255.255.255,tp_src=68,tp_dst=67 >>> actions=conjunction(1857670037,1/2) >>> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >>> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,nw_src=0.0.0.0,tp_src=68,tp_dst=67 >>> actions=conjunction(1857670037,2/2) >>> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >>> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,nw_src=10.37.52.165,tp_src=68,tp_dst=67 >>> actions=conjunction(1857670037,2/2) >>> cookie=0xd7d9689d, duration=0.526s, table=31, n_packets=0, n_bytes=0, >>> priority=100,udp,reg0=0x8/0x8,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d2:18:19,tp_src=68,tp_dst=67 >>> actions=move:NXM_OF_ETH_SRC[]->NXM_OF_ETH_DST[],set_field:fa:16:3e:d7:50:0e->eth_src,set_field:10.37.52.1->ip_src,set_field:67->udp_src,set_field:68->udp_dst,move:NXM_NX_REG14[]->NXM_NX_REG15[],load:0x1->NXM_NX_REG10[0],resubmit(,37) >>> cookie=0x3ca10142, duration=0.526s, table=41, n_packets=0, n_bytes=0, >>> priority=170,reg10=0x400/0x400,reg15=0x1,metadata=0x1,dl_dst=fa:16:3e:d2:18:19 >>> actions=set_field:0->reg0,set_field:0->reg1,set_field:0->reg2,set_field:0->reg3,set_field:0->reg4,set_field:0->reg5,set_field:0->reg6,set_field:0->reg7,set_field:0->reg8,set_field:0->reg9,resubmit(,42) >>> --------------- >>> >>> ---------- >>> grep 10.72.46.3 flows >>> >>> cookie=0x0, duration=0.526s, table=30, n_packets=0, n_bytes=0, >>> priority=100,udp,reg14=0x1,metadata=0x1,dl_src=fa:16:3e:d3:67:02,nw_src=10.72.46.3,tp_src=68,tp_dst=67 >>> actions=conjunction(1857670037,2/2) >>> cookie=0x0, duration=2330.818s, table=46, n_packets=0, n_bytes=0, >>> priority=2002,ip,reg0=0x80/0x80,metadata=0x1,nw_src=10.72.46.3 >>> actions=conjunction(2270989993,1/2) >>> cookie=0x0, duration=2330.817s, table=46, n_packets=0, n_bytes=0, >>> priority=2002,ip,reg0=0x100/0x100,metadata=0x1,nw_src=10.72.46.3 >>> actions=conjunction(2898587360,1/2) >>> ----------- >>> >>> The sriov vm's ip is 10.72.46.3 and its mac addr is >>> ''fa:16:3e:d2:18:19'. >>> >>> Also when pinging the metadata ip 169.254.169.254 from the vm, it is >>> getting a single reply out of 30 reqs or so as shown below >>> >>> ------ >>> # ip netns exec ovnmeta-8f126b23-b062-4021-9245-41d91bdf97d9 tcpdump -l >>> -i tapsd99jef-88 icmp >>> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode >>> listening on tapsd99jef-88, link-type EN10MB (Ethernet), snapshot length >>> 262144 bytes >>> 18:13:39.038132 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 1, length 64 >>> 18:13:40.061961 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 2, length 64 >>> 18:13:41.085949 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 3, length 64 >>> 18:13:42.109964 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 4, length 64 >>> 18:13:43.133974 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 5, length 64 >>> 18:13:44.157950 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 6, length 64 >>> 18:13:44.286197 IP 169.254.169.254 > 10.72.46.3: ICMP echo reply, id >>> 15, seq 6, length 64 >>> 18:13:45.182021 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 7, length 64 >>> 18:13:46.205956 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 8, length 64 >>> 18:13:47.229978 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 9, length 64 >>> 18:13:48.253948 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 10, length 64 >>> 18:13:49.277955 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 11, length 64 >>> 18:13:50.301977 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 12, length 64 >>> 18:13:51.325966 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 13, length 64 >>> 18:13:52.349957 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 14, length 64 >>> 18:13:53.373976 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 15, length 64 >>> 18:13:54.397962 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 16, length 64 >>> 18:13:55.421947 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 17, length 64 >>> 18:13:56.445936 IP 10.72.46.3 > 169.254.169.254: ICMP echo request, id >>> 15, seq 18, length 64 >>> ---------- >>> >>> Can you point out why this metadata is failing ? >>> >>> Appreciate your time.... >>> >>> Que tenga un buen día !!! >>> >>> >>> On Thu, May 29, 2025 at 10:02 PM Ihar Hrachyshka <ihrac...@redhat.com> >>> wrote: >>> >>>> On Thu, May 29, 2025 at 11:53 AM engineer2024 via discuss < >>>> ovs-discuss@openvswitch.org> wrote: >>>> >>>>> Thanks for the response. >>>>> >>>>> This is not what I exactly asked. This scenario is specifically for >>>>> the sriov ports. In this case, how the pkt from the physical nic go back >>>>> to >>>>> the same node ? Two questions. >>>>> >>>>> 1. For sriov vm ports , where does the DHCP responses come from ? >>>>> Where is this maintained in the OVN ? I know for non sriov ports or non >>>>> direct vNIC types, the ovn controller on the compute node intercepts it >>>>> and >>>>> responds. So it never comes out of the compute node. >>>>> >>>>> >>>> Responses, if they do, come from the fabric, probably served from >>>> *another* chassis that is in the HA Group list for the external port (do >>>> you have other computes with the same cms-options setting?). In the >>>> OpenStack group scheduler for external ports, there's an explicit check >>>> against landing external port for a SR-IOV port on the same chassis as the >>>> SR-IOV port itself. You can check sync_ha_chassis_group_network function >>>> in neutron/common/ovn/utils.py to confirm it. >>>> >>>> >>>>> 2. How to provide metadata service for sriov ports ? For non sriov >>>>> ports the ovn metadata namespace does. >>>>> >>>>> >>>> Same as with non-SRIOV ports, metadata will be served by the >>>> ovn-metadata-agent. But it will be served from the host that owns the >>>> external port (through localport). Which is - by design - a different host >>>> from the one that hosts the SR-IOV port. >>>> >>>> >>>>> On Thu, 29 May 2025, 21:09 Daniel Alvarez Sanchez, < >>>>> dalva...@redhat.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> On Thu, May 29, 2025 at 4:50 PM engineer2024 via discuss < >>>>>> ovs-discuss@openvswitch.org> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> i have an openstack ovn setup. For sriov neutron ports, when the >>>>>>> cms option is set on the compute node hosting the sriov nic, as shown >>>>>>> below >>>>>>> >>>>>>> "ovs-vsctl set Open_vSwitch . external-ids:ovn-cms-options=\" >>>>>>> enable-chassis-as-extport-host\"" >>>>>>> >>>>>>> the port is getting the dhcp ip. Now, my question is, from where is >>>>>>> the OVN responding to this external port's DHCP reqs ? I know for a >>>>>>> normal tap port , it goes through br-int and then the ovn-controller >>>>>>> gives >>>>>>> the response. but the sriov port by passes the whole ovs and the host's >>>>>>> kernel network stack. But where does it go to after it exits the >>>>>>> physical >>>>>>> VF interface, and how does OVN answer it ? How and where does OVN's >>>>>>> inbuilt >>>>>>> dhcp service maintained ? >>>>>>> >>>>>>> >>>>>> DHCP will be answered wherever your external port is scheduled. >>>>>> I recommend reading this blogpost I wrote some time back: >>>>>> https://dani.foroselectronica.es/ovn-external-ports-604/ >>>>>> >>>>>> If you're seeing this behavior and you are 100% sure that the same >>>>>> compute node that has the SRIOV port is serving the DHCP requests to that >>>>>> instance then it means that the broadcast request is coming out from the >>>>>> SRIOV port and back in from the same switch presumably to the compute >>>>>> node >>>>>> through a different NIC and from there to br-ex (or similar?) -> br-int >>>>>> -> >>>>>> external-port. I'm not entirely sure about the return path though but you >>>>>> can possibly check with tcpdump :) >>>>>> >>>>>> >>>>>>> Next, for sriov ports, the nova metadata service is also >>>>>>> unreachable, as, it bypasses the ovn-meta namespace on the compute host >>>>>>> connected to the br-int via veth cables. So injecting user data like >>>>>>> ssh >>>>>>> keys is not possible and failing... >>>>>>> >>>>>> >>>>>> Same... you should have the ovn-metadata-agent running where your >>>>>> external port is and this one will proxy the metadata request to nova and >>>>>> serve it back to your sriov instance wherever it is. >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> elinux >>>>>>> _______________________________________________ >>>>>>> discuss mailing list >>>>>>> disc...@openvswitch.org >>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>>> >>>>>> _______________________________________________ >>>>> discuss mailing list >>>>> disc...@openvswitch.org >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>> >>>>
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss