** Changed in: cloud-archive/dalmation Status: New => Fix Released
** Changed in: cloud-archive/caracal Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2017748 Title: [SRU] OVN: ovnmeta namespaces missing during scalability test causing DHCP issues Status in Ubuntu Cloud Archive: Fix Released Status in Ubuntu Cloud Archive antelope series: New Status in Ubuntu Cloud Archive bobcat series: New Status in Ubuntu Cloud Archive caracal series: Fix Released Status in Ubuntu Cloud Archive dalmation series: Fix Released Status in Ubuntu Cloud Archive epoxy series: Fix Released Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: New Status in neutron: New Status in neutron ussuri series: Fix Released Status in neutron victoria series: New Status in neutron wallaby series: New Status in neutron xena series: New Status in neutron package in Ubuntu: Fix Released Status in neutron source package in Focal: New Status in neutron source package in Jammy: New Status in neutron source package in Noble: New Status in neutron source package in Oracular: New Bug description: [Impact] ovnmeta- namespaces are missing intermittently then can't reach to VMs [Test Case] Not able to reproduce this easily, so I run charmed-openstack-tester, the result is below: ====== Totals ====== Ran: 469 tests in 4273.6309 sec. - Passed: 398 - Skipped: 69 - Expected Fail: 0 - Unexpected Success: 0 - Failed: 2 Sum of execute time for each test: 4387.2727 sec. 2 failed tests (tempest.api.object_storage.test_account_quotas.AccountQuotasTest and octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest) is not related to the fix [Where problems could occur] This patches are related to ovn metadata agent in compute. VM's connectivity can possibly be affected by this patch when ovn is used. Biding port to datapath could be affected. [Others] == ORIGINAL DESCRIPTION == Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=2187650 During a scalability test it was noted that a few VMs where having issues being pinged (2 out of ~5000 VMs in the test conducted). After some investigation it was found that the VMs in question did not receive a DHCP lease: udhcpc: no lease, failing FAIL checking http://169.254.169.254/2009-04-04/instance-id failed 1/20: up 181.90. request failed And the ovnmeta- namespaces for the networks that the VMs was booting from were missing. Looking into the ovn-metadata-agent.log: 2023-04-18 06:56:09.864 353474 DEBUG neutron.agent.ovn.metadata.agent [-] There is no metadata port for network 9029c393-5c40-4bf2-beec-27413417eafa or it has no MAC or IP addresses configured, tearing the namespace down if needed _get_provision_params /usr/lib/python3.9/site- packages/neutron/agent/ovn/metadata/agent.py:495 Apparently, when the system is under stress (scalability tests) there are some edge cases where the metadata port information has not yet being propagated by OVN to the Southbound database and when the PortBindingChassisEvent event is being handled and try to find either the metadata port of the IP information on it (which is updated by ML2/OVN during subnet creation) it can not be found and fails silently with the error shown above. Note that, running the same tests but with less concurrency did not trigger this issue. So only happens when the system is overloaded. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/2017748/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp