Public bug reported: I'm seeing this in gate in this run: https://zuul.opendev.org/t/openstack/build/8987272b4be843ea9dffb266ca559006/logs
The symptom of the test failure is that VM is not built. This happens because port is not UP (nova hasn't received vif event for the state change). When we check ovn-controller logs, we see the relevant port is set to up: ``` 2025-01-13T22:30:10.704Z|00067|binding|INFO|Claiming lport 75ffc908-b9c5-421f-b1b9-117b158e860d for this chassis. 2025-01-13T22:30:10.704Z|00068|binding|INFO|75ffc908-b9c5-421f-b1b9-117b158e860d: Claiming fa:16:3e:11:92:5d 10.1.0.13 2025-01-13T22:30:10.704Z|00069|binding|INFO|Claiming lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 for this chassis. 2025-01-13T22:30:10.704Z|00070|binding|INFO|c469b2c4-6961-4953-a2c6-0c106c80b5c8: Claiming fa:16:3e:d5:d9:4e 10.1.0.7 2025-01-13T22:30:10.730Z|00071|binding|INFO|Setting lport 75ffc908-b9c5-421f-b1b9-117b158e860d ovn-installed in OVS 2025-01-13T22:30:10.730Z|00072|binding|INFO|Setting lport 75ffc908-b9c5-421f-b1b9-117b158e860d up in Southbound 2025-01-13T22:30:10.730Z|00073|binding|INFO|Setting lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 ovn-installed in OVS 2025-01-13T22:30:10.730Z|00074|binding|INFO|Setting lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 up in Southbound ``` The port in question is 75ffc908-b9c5-421f-b1b9-117b158e860d. In neutron-api log, ovsdb-monitor receives the updates to SB port- binding table: Jan 13 22:30:10.766618 np0039557494 devstack@neutron-api.service[62570]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event "update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) {{(pid=62570) notify /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}} Jan 13 22:30:10.832940 np0039557494 devstack@neutron-api.service[62570]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event "update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) {{(pid=62570) notify /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}} But note that both events refer to the same row id. Note that in ovn- controller log snippet, two different ports are being claimed / set to up at the same time. Either ovn-controller failed to update 'up' field in SB; or ovsdb-server incorrectly sent duplicate updates to watchers; or Idl watcher somehow messed the IDs. --- In gate, OVN is quite old: 22.03; OVS components are at 2.17.9 (also very old). Wonder if there were some race conditions now fixed in later python-ovs or ovsdb-server or ovn-controller... ** Affects: neutron Importance: Undecided Status: New ** Tags: gate-failure ovn ** Tags added: gate-failure ovn -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2094840 Title: OVN driver doesn't receive port-binding UP update from OVN controller; Nova eventually times out while building VM Status in neutron: New Bug description: I'm seeing this in gate in this run: https://zuul.opendev.org/t/openstack/build/8987272b4be843ea9dffb266ca559006/logs The symptom of the test failure is that VM is not built. This happens because port is not UP (nova hasn't received vif event for the state change). When we check ovn-controller logs, we see the relevant port is set to up: ``` 2025-01-13T22:30:10.704Z|00067|binding|INFO|Claiming lport 75ffc908-b9c5-421f-b1b9-117b158e860d for this chassis. 2025-01-13T22:30:10.704Z|00068|binding|INFO|75ffc908-b9c5-421f-b1b9-117b158e860d: Claiming fa:16:3e:11:92:5d 10.1.0.13 2025-01-13T22:30:10.704Z|00069|binding|INFO|Claiming lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 for this chassis. 2025-01-13T22:30:10.704Z|00070|binding|INFO|c469b2c4-6961-4953-a2c6-0c106c80b5c8: Claiming fa:16:3e:d5:d9:4e 10.1.0.7 2025-01-13T22:30:10.730Z|00071|binding|INFO|Setting lport 75ffc908-b9c5-421f-b1b9-117b158e860d ovn-installed in OVS 2025-01-13T22:30:10.730Z|00072|binding|INFO|Setting lport 75ffc908-b9c5-421f-b1b9-117b158e860d up in Southbound 2025-01-13T22:30:10.730Z|00073|binding|INFO|Setting lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 ovn-installed in OVS 2025-01-13T22:30:10.730Z|00074|binding|INFO|Setting lport c469b2c4-6961-4953-a2c6-0c106c80b5c8 up in Southbound ``` The port in question is 75ffc908-b9c5-421f-b1b9-117b158e860d. In neutron-api log, ovsdb-monitor receives the updates to SB port- binding table: Jan 13 22:30:10.766618 np0039557494 devstack@neutron-api.service[62570]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event "update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) {{(pid=62570) notify /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}} Jan 13 22:30:10.832940 np0039557494 devstack@neutron-api.service[62570]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [None req-0d900a20-58b1-4662-9217-856061d78672 None None] Hash Ring: Node d6133697-439f-4059-b5e7-152917a32dd1 (host: np0039557494) handling event "update" for row cfeb99a7-95f6-4f8c-bb64-cd628734b141 (table: Port_Binding) {{(pid=62570) notify /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:852}} But note that both events refer to the same row id. Note that in ovn- controller log snippet, two different ports are being claimed / set to up at the same time. Either ovn-controller failed to update 'up' field in SB; or ovsdb- server incorrectly sent duplicate updates to watchers; or Idl watcher somehow messed the IDs. --- In gate, OVN is quite old: 22.03; OVS components are at 2.17.9 (also very old). Wonder if there were some race conditions now fixed in later python-ovs or ovsdb-server or ovn-controller... To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2094840/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp