Public bug reported: Sometimes the external_mac is missing in NAT entries in ovn-nb while it's supposed to be there.
In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`. https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171 But the debug message is never logged in _update_dnat_entry_if_needed (debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry has not been committed yet: https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145 if mac and nat['external_mac'] != mac: LOG.debug("Setting external_mac of port %s to %s", port_id, mac) https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128 When looking at the transaction logs for the NAT table in `ovsdb-tool -mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id `neutron:fip_external_mac` is present but not the `external_mac`. The NAT entry is committed at FIP creation time and the presence of `external_mac` is conditional on LSP for the VM port being UP already. `neutron:fip_external_mac`, in contrast, is committed unconditionally per the code: https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_external_mac) https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925 (only sets external_mac if distributed FIPs are enabled and the port's LSP is UP). So if the LSP is not UP at the time of check in `_create_or_update_floatingip`, the NAT entry is created without the external_mac. However, `set_port_status_up` that runs in parallel, but before the NAT entry is committed, simply does not see the NAT record yet and `external_mac` never gets updated by either of the functions. The outcome is that the VM is not reachable due to the lack of the external_mac. In order to fix that, Neutron could check the LSP status after committing the NAT entry as well and updating the external_mac accordingly. Discovered in Neutron 2024.1 but affects the current versions as well. ** Affects: neutron Importance: Undecided Status: In Progress ** Description changed: Sometimes the external_mac is missing in NAT entries in ovn-nb while it's supposed to be there. - In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`. + In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`. https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171 But the debug message is never logged in _update_dnat_entry_if_needed (debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry has not been committed yet: https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145 - if mac and nat['external_mac'] != mac: - LOG.debug("Setting external_mac of port %s to %s", - port_id, mac) + if mac and nat['external_mac'] != mac: + LOG.debug("Setting external_mac of port %s to %s", + port_id, mac) https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128 When looking at the transaction logs for the NAT table in `ovsdb-tool -mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id `neutron:fip_mac_address` is present but not the `external_mac`. The NAT entry is committed at FIP creation time and the presence of `external_mac` is conditional on LSP for the VM port being UP already. `neutron:fip_mac_address`, in contrast, is committed unconditionally per the code: https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_mac_address) https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925 (only sets external_mac if distributed FIPs are enabled and the port's LSP is UP). So if the LSP is not UP at the time of check in `_create_or_update_floatingip`, the NAT entry is created without the external_mac. However, `set_port_status_up` that runs in parallel, but before the NAT entry is committed, simply does not see the NAT record yet and `external_mac` never gets updated by either of the functions. The outcome is that the VM is not reachable due to the lack of the external_mac. In order to fix that, Neutron could check the LSP status after committing the NAT entry as well and updating the external_mac accordingly. + + Discovered in Neutron 2024.1 but affects the current versions as well. ** Description changed: Sometimes the external_mac is missing in NAT entries in ovn-nb while it's supposed to be there. In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`. https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171 But the debug message is never logged in _update_dnat_entry_if_needed (debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry has not been committed yet: https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145 if mac and nat['external_mac'] != mac: LOG.debug("Setting external_mac of port %s to %s", port_id, mac) https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128 When looking at the transaction logs for the NAT table in `ovsdb-tool -mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id - `neutron:fip_mac_address` is present but not the `external_mac`. + `neutron:fip_external_mac` is present but not the `external_mac`. The NAT entry is committed at FIP creation time and the presence of `external_mac` is conditional on LSP for the VM port being UP already. - `neutron:fip_mac_address`, in contrast, is committed unconditionally per - the code: + `neutron:fip_external_mac`, in contrast, is committed unconditionally + per the code: - https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_mac_address) + https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_external_mac) https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925 (only sets external_mac if distributed FIPs are enabled and the port's LSP is UP). So if the LSP is not UP at the time of check in `_create_or_update_floatingip`, the NAT entry is created without the external_mac. However, `set_port_status_up` that runs in parallel, but before the NAT entry is committed, simply does not see the NAT record yet and `external_mac` never gets updated by either of the functions. The outcome is that the VM is not reachable due to the lack of the external_mac. In order to fix that, Neutron could check the LSP status after committing the NAT entry as well and updating the external_mac accordingly. Discovered in Neutron 2024.1 but affects the current versions as well. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2111593 Title: [ovn] Race between FIP NAT entry creation and OVN port status update Status in neutron: In Progress Bug description: Sometimes the external_mac is missing in NAT entries in ovn-nb while it's supposed to be there. In my case a VM port (vnic-type=remote-managed) is created and, shortly after that, a new floating IP is created and assigned to this port. Following that, ovn-controller reports that the port is operationally up as it plugs a VF representor into the OVS bridge and the status propagates from ovn-controller -> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB notification hits Neutron which calls `set_port_status_up` and `_update_dnat_entry_if_needed`. https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171 But the debug message is never logged in _update_dnat_entry_if_needed (debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry has not been committed yet: https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145 if mac and nat['external_mac'] != mac: LOG.debug("Setting external_mac of port %s to %s", port_id, mac) https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128 When looking at the transaction logs for the NAT table in `ovsdb-tool -mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id `neutron:fip_external_mac` is present but not the `external_mac`. The NAT entry is committed at FIP creation time and the presence of `external_mac` is conditional on LSP for the VM port being UP already. `neutron:fip_external_mac`, in contrast, is committed unconditionally per the code: https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907 (unconditional for neutron:fip_external_mac) https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925 (only sets external_mac if distributed FIPs are enabled and the port's LSP is UP). So if the LSP is not UP at the time of check in `_create_or_update_floatingip`, the NAT entry is created without the external_mac. However, `set_port_status_up` that runs in parallel, but before the NAT entry is committed, simply does not see the NAT record yet and `external_mac` never gets updated by either of the functions. The outcome is that the VM is not reachable due to the lack of the external_mac. In order to fix that, Neutron could check the LSP status after committing the NAT entry as well and updating the external_mac accordingly. Discovered in Neutron 2024.1 but affects the current versions as well. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2111593/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp