Public bug reported: On a fresh env, first action over the ovn-provider is getting stuck for 180s on the first txn over OVN NB DB.
After some depth analysis over the threads running there we saw that the GC is called on the driver class, and then calling the shutdown on the helper, doing a join() over the daemon thread responsible of manage th e requests over the helper. At this way we have a dead_lock because any further txn over OVN DB done by ovsdbapp is done using lock and the join is also waiting for that lock getting the thread hang for 180s (timeout ovsdbapp) The inspect on the thread shows this behaviour during the stuck time: Process 2249966: /usr/bin/uwsgi --ini /etc/octavia/octavia-uwsgi.ini --venv /opt/stack/data/venv Python v3.12.3 (/usr/bin/uwsgi-core) Thread 2062601 (active): "uWSGIWorker1Core0" Thread 2250013 (idle): "Thread-2 (run)" _wait_for_tstate_lock (threading.py:1167) join (threading.py:1147) shutdown (ovn_octavia_provider/helper.py:112) __del__ (ovn_octavia_provider/driver.py:51) __subclasscheck__ (<frozen abc>:123) __subclasscheck__ (<frozen abc>:123) __subclasscheck__ (<frozen abc>:123) __subclasscheck__ (<frozen abc>:123) __instancecheck__ (<frozen abc>:119) db_replace_record (ovsdbapp/backend/ovs_idl/idlutils.py:452) set_column (ovsdbapp/backend/ovs_idl/command.py:62) set_columns (ovsdbapp/backend/ovs_idl/command.py:67) run_idl (ovsdbapp/backend/ovs_idl/command.py:115) do_commit (ovsdbapp/backend/ovs_idl/transaction.py:92) run (ovsdbapp/backend/ovs_idl/connection.py:118) run (threading.py:1010) _bootstrap_inner (threading.py:1073) _bootstrap (threading.py:1030) Thread 2250332 (idle): "Thread-3 (request_handler)" wait (threading.py:359) get (queue.py:180) commit (ovsdbapp/backend/ovs_idl/transaction.py:54) __exit__ (ovsdbapp/api.py:71) transaction (ovsdbapp/api.py:114) __exit__ (contextlib.py:144) transaction (impl_idl_ovn.py:180) __exit__ (contextlib.py:144) execute (ovsdbapp/backend/ovs_idl/command.py:49) lb_create (ovn_octavia_provider/helper.py:1146) request_handler (ovn_octavia_provider/helper.py:401) run (threading.py:1010) _bootstrap_inner (threading.py:1073) _bootstrap (threading.py:1030) ** Affects: neutron Importance: Undecided Assignee: Fernando Royo (froyoredhat) Status: In Progress ** Tags: ovn-octavia-provider ** Changed in: neutron Assignee: (unassigned) => Fernando Royo (froyoredhat) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2093347 Title: [ovn-octavia-provider] first request is stuck on OVN txn Status in neutron: In Progress Bug description: On a fresh env, first action over the ovn-provider is getting stuck for 180s on the first txn over OVN NB DB. After some depth analysis over the threads running there we saw that the GC is called on the driver class, and then calling the shutdown on the helper, doing a join() over the daemon thread responsible of manage th e requests over the helper. At this way we have a dead_lock because any further txn over OVN DB done by ovsdbapp is done using lock and the join is also waiting for that lock getting the thread hang for 180s (timeout ovsdbapp) The inspect on the thread shows this behaviour during the stuck time: Process 2249966: /usr/bin/uwsgi --ini /etc/octavia/octavia-uwsgi.ini --venv /opt/stack/data/venv Python v3.12.3 (/usr/bin/uwsgi-core) Thread 2062601 (active): "uWSGIWorker1Core0" Thread 2250013 (idle): "Thread-2 (run)" _wait_for_tstate_lock (threading.py:1167) join (threading.py:1147) shutdown (ovn_octavia_provider/helper.py:112) __del__ (ovn_octavia_provider/driver.py:51) __subclasscheck__ (<frozen abc>:123) __subclasscheck__ (<frozen abc>:123) __subclasscheck__ (<frozen abc>:123) __subclasscheck__ (<frozen abc>:123) __instancecheck__ (<frozen abc>:119) db_replace_record (ovsdbapp/backend/ovs_idl/idlutils.py:452) set_column (ovsdbapp/backend/ovs_idl/command.py:62) set_columns (ovsdbapp/backend/ovs_idl/command.py:67) run_idl (ovsdbapp/backend/ovs_idl/command.py:115) do_commit (ovsdbapp/backend/ovs_idl/transaction.py:92) run (ovsdbapp/backend/ovs_idl/connection.py:118) run (threading.py:1010) _bootstrap_inner (threading.py:1073) _bootstrap (threading.py:1030) Thread 2250332 (idle): "Thread-3 (request_handler)" wait (threading.py:359) get (queue.py:180) commit (ovsdbapp/backend/ovs_idl/transaction.py:54) __exit__ (ovsdbapp/api.py:71) transaction (ovsdbapp/api.py:114) __exit__ (contextlib.py:144) transaction (impl_idl_ovn.py:180) __exit__ (contextlib.py:144) execute (ovsdbapp/backend/ovs_idl/command.py:49) lb_create (ovn_octavia_provider/helper.py:1146) request_handler (ovn_octavia_provider/helper.py:401) run (threading.py:1010) _bootstrap_inner (threading.py:1073) _bootstrap (threading.py:1030) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2093347/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp