Public bug reported: Hi,
We are deploying OpenStack 2024.2 using kolla on Ubuntu Noble. Using OVN as the network overlay. We have this issue when we enable qos on routers and networks, the openvswitch_vswitchd processes start hanging. Haven't tried with just one or the other, but it's shouldn't be possible to bring down a whole cluster with a bit of config. This occured with OpenStack 2023.2 on Jammy as well in the past. So this would have been an older version of Open vSwitch and I even tried with Open vSwitch 3.5.0. Just using a simple ingress/egress limits of 1000/1000 for just 1 network and 500/500 for a single router, Here are the logs 2025-03-19T09:37:24.752Z|409501|connmgr|INFO|br-int<->unix#1: 8 flow_mods 43 s ago (6 adds, 2 deletes) 2025-03-19T09:38:39.945Z|410047|connmgr|INFO|br-int<->unix#1: 10 flow_mods in the 2 s starting 10 s ago (2 adds, 8 deletes) 2025-03-19T09:44:19.786Z|412166|connmgr|INFO|br-int<->unix#1: 4 flow_mods 10 s ago (2 adds, 2 deletes) 2025-03-19T09:45:19.786Z|412576|connmgr|INFO|br-int<->unix#1: 8 flow_mods in the 6 s starting 33 s ago (6 adds, 2 deletes) 2025-03-19T09:54:07.996Z|415871|connmgr|INFO|br-int<->unix#1: 8 flow_mods in the 1 s starting 10 s ago (2 adds, 6 deletes) 2025-03-19T09:54:52.517Z|416385|bridge|INFO|bridge br-int: deleted interface tap66d9c2a6-95 on port 101 2025-03-19T09:55:07.996Z|416743|connmgr|INFO|br-int<->unix#1: 331 flow_mods in the 8 s starting 23 s ago (21 adds, 310 deletes) 2025-03-19T09:56:07.996Z|417114|connmgr|INFO|br-int<->unix#1: 1 flow_mods 56 s ago (1 adds) 2025-03-19T09:56:54.831Z|417448|bridge|INFO|bridge br-int: added interface tapc19e70a1-68 on port 102 2025-03-19T09:56:54.860Z|417540|netdev_linux|WARN|tapc19e70a1-68: removing policing failed: No such device 2025-03-19T09:57:07.996Z|417902|connmgr|INFO|br-int<->unix#1: 207 flow_mods in the 1 s starting 13 s ago (197 adds, 10 deletes) 2025-03-19T10:00:12.730Z|419178|connmgr|INFO|br-int<->unix#1: 94 flow_mods 10 s ago (85 adds, 9 deletes) 2025-03-19T10:01:12.730Z|419549|connmgr|INFO|br-int<->unix#1: 6 flow_mods 37 s ago (4 adds, 2 deletes) 2025-03-19T10:05:54.525Z|421308|connmgr|INFO|br-int<->unix#1: 1 flow_mods 10 s ago (1 adds) 2025-03-19T10:06:54.526Z|421710|connmgr|INFO|br-int<->unix#1: 1 flow_mods 52 s ago (1 deletes) 2025-03-19T10:08:52.756Z|422418|connmgr|INFO|br-int<->unix#1: 1 flow_mods 10 s ago (1 adds) 2025-03-19T11:18:15.953Z|448775|connmgr|INFO|br-int<->unix#1: 176 flow_mods in the 8 s starting 10 s ago (31 adds, 145 deletes) 2025-03-19T11:31:30.570Z|453640|connmgr|INFO|br-int<->unix#1: 1 flow_mods 10 s ago (1 adds) 2025-03-19T11:32:30.570Z|454015|connmgr|INFO|br-int<->unix#1: 1 flow_mods 58 s ago (1 adds) 2025-03-19T11:35:09.140Z|539360|ovs_rcu(urcu9)|WARN|blocked 1000 ms waiting for handler1 to quiesce 2025-03-19T11:35:09.140Z|455059|ovs_rcu|WARN|blocked 1000 ms waiting for handler1 to quiesce 2025-03-19T11:35:10.140Z|539409|ovs_rcu(urcu9)|WARN|blocked 2000 ms waiting for handler1 to quiesce 2025-03-19T11:35:10.141Z|455106|ovs_rcu|WARN|blocked 2000 ms waiting for handler1 to quiesce 2025-03-19T11:35:12.140Z|539497|ovs_rcu(urcu9)|WARN|blocked 4001 ms waiting for handler1 to quiesce 2025-03-19T11:35:12.141Z|455192|ovs_rcu|WARN|blocked 4000 ms waiting for handler1 to quiesce 2025-03-19T11:35:16.140Z|539687|ovs_rcu(urcu9)|WARN|blocked 8000 ms waiting for handler1 to quiesce 2025-03-19T11:35:16.141Z|455387|ovs_rcu|WARN|blocked 8000 ms waiting for handler1 to quiesce 2025-03-19T11:35:24.139Z|540106|ovs_rcu(urcu9)|WARN|blocked 16000 ms waiting for handler1 to quiesce 2025-03-19T11:35:24.140Z|455837|ovs_rcu|WARN|blocked 16000 ms waiting for handler1 to quiesce 2025-03-19T11:35:40.139Z|541019|ovs_rcu(urcu9)|WARN|blocked 32000 ms waiting for handler1 to quiesce 2025-03-19T11:35:40.140Z|456773|ovs_rcu|WARN|blocked 32000 ms waiting for handler1 to quiesce 2025-03-19T11:36:12.139Z|542611|ovs_rcu(urcu9)|WARN|blocked 64000 ms waiting for handler1 to quiesce 2025-03-19T11:36:12.140Z|458417|ovs_rcu|WARN|blocked 64000 ms waiting for handler1 to quiesce 2025-03-19T11:37:16.140Z|545667|ovs_rcu(urcu9)|WARN|blocked 128000 ms waiting for handler1 to quiesce 2025-03-19T11:37:16.141Z|461499|ovs_rcu|WARN|blocked 128000 ms waiting for handler1 to quiesce 2025-03-19T11:39:24.139Z|551954|ovs_rcu(urcu9)|WARN|blocked 256000 ms waiting for handler1 to quiesce 2025-03-19T11:39:24.140Z|467913|ovs_rcu|WARN|blocked 256000 ms waiting for handler1 to quiesce 2025-03-19T11:43:40.140Z|564156|ovs_rcu(urcu9)|WARN|blocked 512000 ms waiting for handler1 to quiesce 2025-03-19T11:43:40.141Z|480412|ovs_rcu|WARN|blocked 512000 ms waiting for handler1 to quiesce 2025-03-19T11:50:04.648Z|00001|vlog|INFO|opened log file /var/log/kolla/openvswitch/ovs-vswitchd.log Any ideas? Thanks Daniel ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2103641 Title: Open vSwitch (Version 3.3.0) goes into deadlocked state Status in neutron: New Bug description: Hi, We are deploying OpenStack 2024.2 using kolla on Ubuntu Noble. Using OVN as the network overlay. We have this issue when we enable qos on routers and networks, the openvswitch_vswitchd processes start hanging. Haven't tried with just one or the other, but it's shouldn't be possible to bring down a whole cluster with a bit of config. This occured with OpenStack 2023.2 on Jammy as well in the past. So this would have been an older version of Open vSwitch and I even tried with Open vSwitch 3.5.0. Just using a simple ingress/egress limits of 1000/1000 for just 1 network and 500/500 for a single router, Here are the logs 2025-03-19T09:37:24.752Z|409501|connmgr|INFO|br-int<->unix#1: 8 flow_mods 43 s ago (6 adds, 2 deletes) 2025-03-19T09:38:39.945Z|410047|connmgr|INFO|br-int<->unix#1: 10 flow_mods in the 2 s starting 10 s ago (2 adds, 8 deletes) 2025-03-19T09:44:19.786Z|412166|connmgr|INFO|br-int<->unix#1: 4 flow_mods 10 s ago (2 adds, 2 deletes) 2025-03-19T09:45:19.786Z|412576|connmgr|INFO|br-int<->unix#1: 8 flow_mods in the 6 s starting 33 s ago (6 adds, 2 deletes) 2025-03-19T09:54:07.996Z|415871|connmgr|INFO|br-int<->unix#1: 8 flow_mods in the 1 s starting 10 s ago (2 adds, 6 deletes) 2025-03-19T09:54:52.517Z|416385|bridge|INFO|bridge br-int: deleted interface tap66d9c2a6-95 on port 101 2025-03-19T09:55:07.996Z|416743|connmgr|INFO|br-int<->unix#1: 331 flow_mods in the 8 s starting 23 s ago (21 adds, 310 deletes) 2025-03-19T09:56:07.996Z|417114|connmgr|INFO|br-int<->unix#1: 1 flow_mods 56 s ago (1 adds) 2025-03-19T09:56:54.831Z|417448|bridge|INFO|bridge br-int: added interface tapc19e70a1-68 on port 102 2025-03-19T09:56:54.860Z|417540|netdev_linux|WARN|tapc19e70a1-68: removing policing failed: No such device 2025-03-19T09:57:07.996Z|417902|connmgr|INFO|br-int<->unix#1: 207 flow_mods in the 1 s starting 13 s ago (197 adds, 10 deletes) 2025-03-19T10:00:12.730Z|419178|connmgr|INFO|br-int<->unix#1: 94 flow_mods 10 s ago (85 adds, 9 deletes) 2025-03-19T10:01:12.730Z|419549|connmgr|INFO|br-int<->unix#1: 6 flow_mods 37 s ago (4 adds, 2 deletes) 2025-03-19T10:05:54.525Z|421308|connmgr|INFO|br-int<->unix#1: 1 flow_mods 10 s ago (1 adds) 2025-03-19T10:06:54.526Z|421710|connmgr|INFO|br-int<->unix#1: 1 flow_mods 52 s ago (1 deletes) 2025-03-19T10:08:52.756Z|422418|connmgr|INFO|br-int<->unix#1: 1 flow_mods 10 s ago (1 adds) 2025-03-19T11:18:15.953Z|448775|connmgr|INFO|br-int<->unix#1: 176 flow_mods in the 8 s starting 10 s ago (31 adds, 145 deletes) 2025-03-19T11:31:30.570Z|453640|connmgr|INFO|br-int<->unix#1: 1 flow_mods 10 s ago (1 adds) 2025-03-19T11:32:30.570Z|454015|connmgr|INFO|br-int<->unix#1: 1 flow_mods 58 s ago (1 adds) 2025-03-19T11:35:09.140Z|539360|ovs_rcu(urcu9)|WARN|blocked 1000 ms waiting for handler1 to quiesce 2025-03-19T11:35:09.140Z|455059|ovs_rcu|WARN|blocked 1000 ms waiting for handler1 to quiesce 2025-03-19T11:35:10.140Z|539409|ovs_rcu(urcu9)|WARN|blocked 2000 ms waiting for handler1 to quiesce 2025-03-19T11:35:10.141Z|455106|ovs_rcu|WARN|blocked 2000 ms waiting for handler1 to quiesce 2025-03-19T11:35:12.140Z|539497|ovs_rcu(urcu9)|WARN|blocked 4001 ms waiting for handler1 to quiesce 2025-03-19T11:35:12.141Z|455192|ovs_rcu|WARN|blocked 4000 ms waiting for handler1 to quiesce 2025-03-19T11:35:16.140Z|539687|ovs_rcu(urcu9)|WARN|blocked 8000 ms waiting for handler1 to quiesce 2025-03-19T11:35:16.141Z|455387|ovs_rcu|WARN|blocked 8000 ms waiting for handler1 to quiesce 2025-03-19T11:35:24.139Z|540106|ovs_rcu(urcu9)|WARN|blocked 16000 ms waiting for handler1 to quiesce 2025-03-19T11:35:24.140Z|455837|ovs_rcu|WARN|blocked 16000 ms waiting for handler1 to quiesce 2025-03-19T11:35:40.139Z|541019|ovs_rcu(urcu9)|WARN|blocked 32000 ms waiting for handler1 to quiesce 2025-03-19T11:35:40.140Z|456773|ovs_rcu|WARN|blocked 32000 ms waiting for handler1 to quiesce 2025-03-19T11:36:12.139Z|542611|ovs_rcu(urcu9)|WARN|blocked 64000 ms waiting for handler1 to quiesce 2025-03-19T11:36:12.140Z|458417|ovs_rcu|WARN|blocked 64000 ms waiting for handler1 to quiesce 2025-03-19T11:37:16.140Z|545667|ovs_rcu(urcu9)|WARN|blocked 128000 ms waiting for handler1 to quiesce 2025-03-19T11:37:16.141Z|461499|ovs_rcu|WARN|blocked 128000 ms waiting for handler1 to quiesce 2025-03-19T11:39:24.139Z|551954|ovs_rcu(urcu9)|WARN|blocked 256000 ms waiting for handler1 to quiesce 2025-03-19T11:39:24.140Z|467913|ovs_rcu|WARN|blocked 256000 ms waiting for handler1 to quiesce 2025-03-19T11:43:40.140Z|564156|ovs_rcu(urcu9)|WARN|blocked 512000 ms waiting for handler1 to quiesce 2025-03-19T11:43:40.141Z|480412|ovs_rcu|WARN|blocked 512000 ms waiting for handler1 to quiesce 2025-03-19T11:50:04.648Z|00001|vlog|INFO|opened log file /var/log/kolla/openvswitch/ovs-vswitchd.log Any ideas? Thanks Daniel To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2103641/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp