Hi folks, During the development of a Python library which uses under the hood a C++ library - using Cyhton - we found a deadlock issue between a lock maintained by the C++ library and the GIL lock.
The problem started to appear at the moment that we tried to offload all IO operations into an isolated thread. Basically, a thread running an Asyncio loop. Once this thread was added, we had to start using call_soon_threadsafe for scheduling new functions into the IO loop, hereby is when deadlocks start appearing. Why? Before adding the new thread for offloading the IO operations we were not releasing the GIL at each call to the C++ library, and in the very beginning tried to be stick to that plan which resulted to be nocive with the appearance of deadlocks. Two threads might have the following behaviour, which would result in a deadlock: 1 - (thread 1) (Cython) call a C++ function 2 - (thread 1) (C++ code) create a mutex X 3 - (thread 1) (C++ code) calls cython callback 4 - (thread 1) (Cython) calls call_son_threadsafe 5 - (thread 1) (Python) releases the GIL 6 - (thread 1) (Python) sock.sends(...) 7 - (thread 2) (Python) acquires GIL 8 - (thread 2) (Cython) call a C++ function 9 - (thread 2) (C++ code) tries to acquire the mutex X (gets locked) 10 - (thread 1) (Python) acuqires GIL (gets locked) The IO thread synchronization, which was done by writing to a specific FD, was releasing the GIL which would give the chance to other threads to be executed and have the chance on getting locked into the already locked mutex, ending up in a fatal deadlock. For addressing the situation we explicitly released the GIL at each C++ call, which solved the issue with some none negligible performance sacrificing. Do you have any idea how this deadlock could be prevented without having to release the GIL at each C++ call? Considering that we do not have any freedom on modifying the C++ code. We have a solution that might work but it's not easy to implement, or at least not for all of the environments. The idea is basically not running the piece of code that is implicitly releasing the GIL and deferring its execution to after the C++ code function. It seems to work quite well since the C++ code is basically an asynchronous framework that only needs to schedule a callback. This is doable in environments where we already have an Asyncio thread, its a matter of make a call to `call_soon(offload_io_work_into_another_thread, cb)`, but for environments where we do not have an automatic mechanism for deferring the execution of code we would need to implement them which could be a none straightforward task. Thoughts? Thanks! -- --pau -- https://mail.python.org/mailman/listinfo/python-list