Public bug reported:

Sometimes the external_mac is missing in NAT entries in ovn-nb while
it's supposed to be there.

In my case a VM port (vnic-type=remote-managed) is created and, shortly after 
that, a new floating IP is created and assigned to this port. Following that, 
ovn-controller reports that the port is operationally up as it plugs a VF 
representor into the OVS bridge and the status propagates from ovn-controller 
-> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB 
notification hits Neutron which calls `set_port_status_up` and 
`_update_dnat_entry_if_needed`.
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171

But the debug message is never logged in _update_dnat_entry_if_needed 
(debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry 
has not been committed yet:
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145
                if mac and nat['external_mac'] != mac:
                    LOG.debug("Setting external_mac of port %s to %s",
                              port_id, mac)
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128

When looking at the transaction logs for the NAT table in `ovsdb-tool
-mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id
`neutron:fip_external_mac` is present but not the `external_mac`.

The NAT entry is committed at FIP creation time and the presence of
`external_mac` is conditional on LSP for the VM port being UP already.
`neutron:fip_external_mac`, in contrast, is committed unconditionally
per the code:

https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907
 (unconditional for neutron:fip_external_mac)
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925
 (only sets external_mac if distributed FIPs are enabled and the port's LSP is 
UP).

So if the LSP is not UP at the time of check in
`_create_or_update_floatingip`, the NAT entry is created without the
external_mac. However, `set_port_status_up` that runs in parallel, but
before the NAT entry is committed, simply does not see the NAT record
yet and `external_mac` never gets updated by either of the functions.
The outcome is that the VM is not reachable due to the lack of the
external_mac.

In order to fix that, Neutron could check the LSP status after
committing the NAT entry as well and updating the external_mac
accordingly.

Discovered in Neutron 2024.1 but affects the current versions as well.

** Affects: neutron
     Importance: Undecided
         Status: In Progress

** Description changed:

  Sometimes the external_mac is missing in NAT entries in ovn-nb while
  it's supposed to be there.
  
- In my case a VM port (vnic-type=remote-managed) is created and, shortly after 
that, a new floating IP is created and assigned to this port. Following that, 
ovn-controller reports that the port is operationally up as it plugs a VF 
representor into the OVS bridge and the status propagates from ovn-controller 
-> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB 
notification hits Neutron which calls `set_port_status_up` and 
`_update_dnat_entry_if_needed`. 
+ In my case a VM port (vnic-type=remote-managed) is created and, shortly after 
that, a new floating IP is created and assigned to this port. Following that, 
ovn-controller reports that the port is operationally up as it plugs a VF 
representor into the OVS bridge and the status propagates from ovn-controller 
-> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB 
notification hits Neutron which calls `set_port_status_up` and 
`_update_dnat_entry_if_needed`.
  
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171
  
  But the debug message is never logged in _update_dnat_entry_if_needed 
(debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry 
has not been committed yet:
  
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145
-                 if mac and nat['external_mac'] != mac:
-                     LOG.debug("Setting external_mac of port %s to %s",
-                               port_id, mac)
+                 if mac and nat['external_mac'] != mac:
+                     LOG.debug("Setting external_mac of port %s to %s",
+                               port_id, mac)
  
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128
  
  When looking at the transaction logs for the NAT table in `ovsdb-tool
  -mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id
  `neutron:fip_mac_address` is present but not the `external_mac`.
  
  The NAT entry is committed at FIP creation time and the presence of
  `external_mac` is conditional on LSP for the VM port being UP already.
  `neutron:fip_mac_address`, in contrast, is committed unconditionally per
  the code:
  
  
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907
 (unconditional for neutron:fip_mac_address)
  
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925
 (only sets external_mac if distributed FIPs are enabled and the port's LSP is 
UP).
  
  So if the LSP is not UP at the time of check in
  `_create_or_update_floatingip`, the NAT entry is created without the
  external_mac. However, `set_port_status_up` that runs in parallel, but
  before the NAT entry is committed, simply does not see the NAT record
  yet and `external_mac` never gets updated by either of the functions.
  The outcome is that the VM is not reachable due to the lack of the
  external_mac.
  
  In order to fix that, Neutron could check the LSP status after
  committing the NAT entry as well and updating the external_mac
  accordingly.
+ 
+ Discovered in Neutron 2024.1 but affects the current versions as well.

** Description changed:

  Sometimes the external_mac is missing in NAT entries in ovn-nb while
  it's supposed to be there.
  
  In my case a VM port (vnic-type=remote-managed) is created and, shortly after 
that, a new floating IP is created and assigned to this port. Following that, 
ovn-controller reports that the port is operationally up as it plugs a VF 
representor into the OVS bridge and the status propagates from ovn-controller 
-> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB 
notification hits Neutron which calls `set_port_status_up` and 
`_update_dnat_entry_if_needed`.
  
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171
  
  But the debug message is never logged in _update_dnat_entry_if_needed 
(debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry 
has not been committed yet:
  
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145
                  if mac and nat['external_mac'] != mac:
                      LOG.debug("Setting external_mac of port %s to %s",
                                port_id, mac)
  
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128
  
  When looking at the transaction logs for the NAT table in `ovsdb-tool
  -mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id
- `neutron:fip_mac_address` is present but not the `external_mac`.
+ `neutron:fip_external_mac` is present but not the `external_mac`.
  
  The NAT entry is committed at FIP creation time and the presence of
  `external_mac` is conditional on LSP for the VM port being UP already.
- `neutron:fip_mac_address`, in contrast, is committed unconditionally per
- the code:
+ `neutron:fip_external_mac`, in contrast, is committed unconditionally
+ per the code:
  
- 
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907
 (unconditional for neutron:fip_mac_address)
+ 
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907
 (unconditional for neutron:fip_external_mac)
  
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925
 (only sets external_mac if distributed FIPs are enabled and the port's LSP is 
UP).
  
  So if the LSP is not UP at the time of check in
  `_create_or_update_floatingip`, the NAT entry is created without the
  external_mac. However, `set_port_status_up` that runs in parallel, but
  before the NAT entry is committed, simply does not see the NAT record
  yet and `external_mac` never gets updated by either of the functions.
  The outcome is that the VM is not reachable due to the lack of the
  external_mac.
  
  In order to fix that, Neutron could check the LSP status after
  committing the NAT entry as well and updating the external_mac
  accordingly.
  
  Discovered in Neutron 2024.1 but affects the current versions as well.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2111593

Title:
  [ovn] Race between FIP NAT entry creation and OVN port status update

Status in neutron:
  In Progress

Bug description:
  Sometimes the external_mac is missing in NAT entries in ovn-nb while
  it's supposed to be there.

  In my case a VM port (vnic-type=remote-managed) is created and, shortly after 
that, a new floating IP is created and assigned to this port. Following that, 
ovn-controller reports that the port is operationally up as it plugs a VF 
representor into the OVS bridge and the status propagates from ovn-controller 
-> ovn-sb -> ovn-northd -> ovn-nb -> Neutron. Eventually the OVN NB 
notification hits Neutron which calls `set_port_status_up` and 
`_update_dnat_entry_if_needed`.
  
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1165-L1171

  But the debug message is never logged in _update_dnat_entry_if_needed 
(debug=True is set, ovn_distributed_floating_ip is enabled) as the NAT entry 
has not been committed yet:
  
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1143-L1145
                  if mac and nat['external_mac'] != mac:
                      LOG.debug("Setting external_mac of port %s to %s",
                                port_id, mac)
  
https://github.com/openstack/neutron/blob/0d4ef860549a552db3a9d04934154ad256662bbb/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1127-L1128

  When looking at the transaction logs for the NAT table in `ovsdb-tool
  -mm show-log /var/lib/ovn/ovn_nb.db` I can see that the external-id
  `neutron:fip_external_mac` is present but not the `external_mac`.

  The NAT entry is committed at FIP creation time and the presence of
  `external_mac` is conditional on LSP for the VM port being UP already.
  `neutron:fip_external_mac`, in contrast, is committed unconditionally
  per the code:

  
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L907
 (unconditional for neutron:fip_external_mac)
  
https://github.com/openstack/neutron/blob/e3a56a401f8fc94accb543d0f87a5b9fc2654d3d/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L923-L925
 (only sets external_mac if distributed FIPs are enabled and the port's LSP is 
UP).

  So if the LSP is not UP at the time of check in
  `_create_or_update_floatingip`, the NAT entry is created without the
  external_mac. However, `set_port_status_up` that runs in parallel, but
  before the NAT entry is committed, simply does not see the NAT record
  yet and `external_mac` never gets updated by either of the functions.
  The outcome is that the VM is not reachable due to the lack of the
  external_mac.

  In order to fix that, Neutron could check the LSP status after
  committing the NAT entry as well and updating the external_mac
  accordingly.

  Discovered in Neutron 2024.1 but affects the current versions as well.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2111593/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to