Public bug reported:

I'm seeing this in gate in this run:
https://zuul.opendev.org/t/openstack/build/8987272b4be843ea9dffb266ca559006/logs

The symptom of the test failure is that VM is not built. This happens
because port is not UP (nova hasn't received vif event for the state
change). When we check ovn-controller logs, we see the relevant port is
set to up:

```
2025-01-13T22:30:10.704Z|00067|binding|INFO|Claiming lport 
75ffc908-b9c5-421f-b1b9-117b158e860d for this chassis.
2025-01-13T22:30:10.704Z|00068|binding|INFO|75ffc908-b9c5-421f-b1b9-117b158e860d:
 Claiming fa:16:3e:11:92:5d 10.1.0.13
2025-01-13T22:30:10.704Z|00069|binding|INFO|Claiming lport 
c469b2c4-6961-4953-a2c6-0c106c80b5c8 for this chassis.
2025-01-13T22:30:10.704Z|00070|binding|INFO|c469b2c4-6961-4953-a2c6-0c106c80b5c8:
 Claiming fa:16:3e:d5:d9:4e 10.1.0.7
2025-01-13T22:30:10.730Z|00071|binding|INFO|Setting lport 
75ffc908-b9c5-421f-b1b9-117b158e860d ovn-installed in OVS
2025-01-13T22:30:10.730Z|00072|binding|INFO|Setting lport 
75ffc908-b9c5-421f-b1b9-117b158e860d up in Southbound
2025-01-13T22:30:10.730Z|00073|binding|INFO|Setting lport 
c469b2c4-6961-4953-a2c6-0c106c80b5c8 ovn-installed in OVS
2025-01-13T22:30:10.730Z|00074|binding|INFO|Setting lport 
c469b2c4-6961-4953-a2c6-0c106c80b5c8 up in Southbound

```

The port in question is 75ffc908-b9c5-421f-b1b9-117b158e860d.

In neutron-api log, ovsdb-monitor receives the updates to SB port-
binding table:

Jan 13 22:30:10.766618 np0039557494 devstack@neutron-api.service[62570]: DEBUG 
neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None 
req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node 
d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event 
"update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) 
{{(pid=62570) notify 
/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}}
Jan 13 22:30:10.832940 np0039557494 devstack@neutron-api.service[62570]: DEBUG 
neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None 
req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node 
d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event 
"update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) 
{{(pid=62570) notify 
/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}}

But note that both events refer to the same row id. Note that in ovn-
controller log snippet, two different ports are being claimed / set to
up at the same time.

Either ovn-controller failed to update 'up' field in SB; or ovsdb-server
incorrectly sent duplicate updates to watchers; or Idl watcher somehow
messed the IDs.

---

In gate, OVN is quite old: 22.03; OVS components are at 2.17.9 (also
very old). Wonder if there were some race conditions now fixed in later
python-ovs or ovsdb-server or ovn-controller...

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: gate-failure ovn

** Tags added: gate-failure ovn

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2094840

Title:
  OVN driver doesn't receive port-binding UP update from OVN controller;
  Nova eventually times out while building VM

Status in neutron:
  New

Bug description:
  I'm seeing this in gate in this run:
  
https://zuul.opendev.org/t/openstack/build/8987272b4be843ea9dffb266ca559006/logs

  The symptom of the test failure is that VM is not built. This happens
  because port is not UP (nova hasn't received vif event for the state
  change). When we check ovn-controller logs, we see the relevant port
  is set to up:

  ```
  2025-01-13T22:30:10.704Z|00067|binding|INFO|Claiming lport 
75ffc908-b9c5-421f-b1b9-117b158e860d for this chassis.
  
2025-01-13T22:30:10.704Z|00068|binding|INFO|75ffc908-b9c5-421f-b1b9-117b158e860d:
 Claiming fa:16:3e:11:92:5d 10.1.0.13
  2025-01-13T22:30:10.704Z|00069|binding|INFO|Claiming lport 
c469b2c4-6961-4953-a2c6-0c106c80b5c8 for this chassis.
  
2025-01-13T22:30:10.704Z|00070|binding|INFO|c469b2c4-6961-4953-a2c6-0c106c80b5c8:
 Claiming fa:16:3e:d5:d9:4e 10.1.0.7
  2025-01-13T22:30:10.730Z|00071|binding|INFO|Setting lport 
75ffc908-b9c5-421f-b1b9-117b158e860d ovn-installed in OVS
  2025-01-13T22:30:10.730Z|00072|binding|INFO|Setting lport 
75ffc908-b9c5-421f-b1b9-117b158e860d up in Southbound
  2025-01-13T22:30:10.730Z|00073|binding|INFO|Setting lport 
c469b2c4-6961-4953-a2c6-0c106c80b5c8 ovn-installed in OVS
  2025-01-13T22:30:10.730Z|00074|binding|INFO|Setting lport 
c469b2c4-6961-4953-a2c6-0c106c80b5c8 up in Southbound

  ```

  The port in question is 75ffc908-b9c5-421f-b1b9-117b158e860d.

  In neutron-api log, ovsdb-monitor receives the updates to SB port-
  binding table:

  Jan 13 22:30:10.766618 np0039557494 devstack@neutron-api.service[62570]: 
DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None 
req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node 
d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event 
"update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) 
{{(pid=62570) notify 
/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}}
  Jan 13 22:30:10.832940 np0039557494 devstack@neutron-api.service[62570]: 
DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None 
req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node 
d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event 
"update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) 
{{(pid=62570) notify 
/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}}

  But note that both events refer to the same row id. Note that in ovn-
  controller log snippet, two different ports are being claimed / set to
  up at the same time.

  Either ovn-controller failed to update 'up' field in SB; or ovsdb-
  server incorrectly sent duplicate updates to watchers; or Idl watcher
  somehow messed the IDs.

  ---

  In gate, OVN is quite old: 22.03; OVS components are at 2.17.9 (also
  very old). Wonder if there were some race conditions now fixed in
  later python-ovs or ovsdb-server or ovn-controller...

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2094840/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to