Public bug reported: Description -----------
When the DvrLocalRouter object is instantiated, it calls the the _load_used_fip_information() function. In some cases this function will try to add ip rules in a specific network namespace however that namespace may not exist at the time. This results in neutron.privileged.agent.linux.ip_lib.NetworkNamespaceNotFound being thrown. Pre-conditions -------------- - DVR is in use and the created router is distributed and HA - The state file 'fip-priorities' is missing some entires which results in https://opendev.org/openstack/neutron/src/commit/0c5d4b872899497437d1399c845be756103a46d3/neutron/agent/l3/dvr_local_router.py#L76 being skipped - The qrouter network namespace does not exist (possibly due to a reboot of the host or something similar) Step-by-step reproduction steps ------------------------------- - Setup OpenStack with DVR enabled - Create a HA router with an external subnet attached so we can use the IPs as FIPs - Create a VM with a FIP attached from the aforementioned router - SSH to the host running the aforementioned VM and: - Delete the qrouter namespace associated with this router - Remove the entry for the FIP from the fip-priorities state file in the Neutron state directory - Restart the Neutron L3 agent Expected output --------------- Neutron L3 agent should restart without any errors. Actual output ------------- Neutron L3 agent throws a NetworkNamespaceNotFound exception for each missing FIP in the fip-priorities state file, fails to setup the router and then retries. Note that if there are more than 5 missing FIP entires in the fip-priorities file then the router setup fails completely as it hits the retry limit specified in https://opendev.org/openstack/neutron/src/commit/0c5d4b872899497437d1399c845be756103a46d3/neutron/agent/l3/agent.py#L730-L733. This leaves the router completely broken and not setup on the node resulting in broken networking for all VMs using that router on a particular host. Version ------- - OpenStack version - master/zed - Linux distro - AlmaLinux9 - Deployed via Kolla Ansible ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2025129 Title: DvrLocalRouter init references namespace before it is created Status in neutron: New Bug description: Description ----------- When the DvrLocalRouter object is instantiated, it calls the the _load_used_fip_information() function. In some cases this function will try to add ip rules in a specific network namespace however that namespace may not exist at the time. This results in neutron.privileged.agent.linux.ip_lib.NetworkNamespaceNotFound being thrown. Pre-conditions -------------- - DVR is in use and the created router is distributed and HA - The state file 'fip-priorities' is missing some entires which results in https://opendev.org/openstack/neutron/src/commit/0c5d4b872899497437d1399c845be756103a46d3/neutron/agent/l3/dvr_local_router.py#L76 being skipped - The qrouter network namespace does not exist (possibly due to a reboot of the host or something similar) Step-by-step reproduction steps ------------------------------- - Setup OpenStack with DVR enabled - Create a HA router with an external subnet attached so we can use the IPs as FIPs - Create a VM with a FIP attached from the aforementioned router - SSH to the host running the aforementioned VM and: - Delete the qrouter namespace associated with this router - Remove the entry for the FIP from the fip-priorities state file in the Neutron state directory - Restart the Neutron L3 agent Expected output --------------- Neutron L3 agent should restart without any errors. Actual output ------------- Neutron L3 agent throws a NetworkNamespaceNotFound exception for each missing FIP in the fip-priorities state file, fails to setup the router and then retries. Note that if there are more than 5 missing FIP entires in the fip-priorities file then the router setup fails completely as it hits the retry limit specified in https://opendev.org/openstack/neutron/src/commit/0c5d4b872899497437d1399c845be756103a46d3/neutron/agent/l3/agent.py#L730-L733. This leaves the router completely broken and not setup on the node resulting in broken networking for all VMs using that router on a particular host. Version ------- - OpenStack version - master/zed - Linux distro - AlmaLinux9 - Deployed via Kolla Ansible To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2025129/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp