Reviewed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/938797 Committed: https://opendev.org/openstack/ovn-octavia-provider/commit/26431c9ab159032f9122d3f6fe6be95798ad0497 Submitter: "Zuul (22348)" Branch: master
commit 26431c9ab159032f9122d3f6fe6be95798ad0497 Author: Fernando Royo <fr...@redhat.com> Date: Thu Jan 9 10:50:09 2025 +0100 Remove join on helper request daemon thread This patch remove the join over the request daemon thread attending the request on the OVN-provider. This is generating a deadlock when the GC is called over the driver class calling the helper shutdown, and the request thread is attending any previous request (through ovsdbapp using a lock over the txn on OVN DBs) As the request thread is a daemon thread this operation looks safe. Closes-Bug: #2093347 Change-Id: I464f6cebf3c65f300a3d0f10b661f77215475a7e ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2093347 Title: [ovn-octavia-provider] first request is stuck on OVN txn Status in neutron: Fix Released Bug description: On a fresh env, first action over the ovn-provider is getting stuck for 180s on the first txn over OVN NB DB. After some depth analysis over the threads running there we saw that the GC is called on the driver class, and then calling the shutdown on the helper, doing a join() over the daemon thread responsible of manage th e requests over the helper. At this way we have a dead_lock because any further txn over OVN DB done by ovsdbapp is done using lock and the join is also waiting for that lock getting the thread hang for 180s (timeout ovsdbapp) The inspect on the thread shows this behaviour during the stuck time: Process 2249966: /usr/bin/uwsgi --ini /etc/octavia/octavia-uwsgi.ini --venv /opt/stack/data/venv Python v3.12.3 (/usr/bin/uwsgi-core) Thread 2062601 (active): "uWSGIWorker1Core0" Thread 2250013 (idle): "Thread-2 (run)" _wait_for_tstate_lock (threading.py:1167) join (threading.py:1147) shutdown (ovn_octavia_provider/helper.py:112) __del__ (ovn_octavia_provider/driver.py:51) __subclasscheck__ (<frozen abc>:123) __subclasscheck__ (<frozen abc>:123) __subclasscheck__ (<frozen abc>:123) __subclasscheck__ (<frozen abc>:123) __instancecheck__ (<frozen abc>:119) db_replace_record (ovsdbapp/backend/ovs_idl/idlutils.py:452) set_column (ovsdbapp/backend/ovs_idl/command.py:62) set_columns (ovsdbapp/backend/ovs_idl/command.py:67) run_idl (ovsdbapp/backend/ovs_idl/command.py:115) do_commit (ovsdbapp/backend/ovs_idl/transaction.py:92) run (ovsdbapp/backend/ovs_idl/connection.py:118) run (threading.py:1010) _bootstrap_inner (threading.py:1073) _bootstrap (threading.py:1030) Thread 2250332 (idle): "Thread-3 (request_handler)" wait (threading.py:359) get (queue.py:180) commit (ovsdbapp/backend/ovs_idl/transaction.py:54) __exit__ (ovsdbapp/api.py:71) transaction (ovsdbapp/api.py:114) __exit__ (contextlib.py:144) transaction (impl_idl_ovn.py:180) __exit__ (contextlib.py:144) execute (ovsdbapp/backend/ovs_idl/command.py:49) lb_create (ovn_octavia_provider/helper.py:1146) request_handler (ovn_octavia_provider/helper.py:401) run (threading.py:1010) _bootstrap_inner (threading.py:1073) _bootstrap (threading.py:1030) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2093347/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp