[issue14390] Tkinter single-threaded deadlock
New submission from John Bollinger : This is the same as issue 452973, created as a new issue pursuant to the instruction given when 452973 was closed as "out of date". In a nutshell, in a program using combining Tkinter with Tcl callbacks written in C, it is possible for even a single-threaded program to deadlock. The case I ran into had these particulars: The main program is in Python, but it relies on a custom extension written in C. Through that extension, C callbacks are registered for various Tcl GUI events, and most of these invoke Python functions via Python's C API. Many of those Python functions invoke Tkinter methods. For example, many of the callbacks are bound to menu item activations, and these typically [try to] contruct a Tkinter dialog the first time they are called. What happens in practice is that the program starts fine, but the GUI freezes as soon as any menu item is activated that has one of the affected callbacks bound to it. Gdb and I are confident that the problem is as described in issue 452973: the program's single thread acquires TKinter's internal Tcl lock when the mouse event processing begins, and does not release it before control re-enters Python (there is no public API by which it can be made to do so). When the Python function invokes Tkinter methods, tkinter attempts to acquire the lock again, at which point it deadlocks because it holds the lock already. I encountered this issue on CentOS 6 (thus Python 2.6.6), but it appears that the problem is still present in the Python 3 trunk. I have flagged this issue only for version 2.6, however, because I cannot currently confirm that it affects later versions (see below regarding testing). I developed a patch against 2.6.6. It fixes the problem by allowing the Tcl lock to be acquired multiple times by any one thread (and requiring it to be released the same number of times before another thread can acquire it). That is perhaps technically inferior to creating public functions around _tkinter.c's ENTER_PYTHON and LEAVE_PYTHON macros, but it doesn't touch the public API. Even if new public functions were provided, the reentrant locking might still be a good fallback. The patch applies cleanly to the trunk, so probably also to every version between that and 2.6.6. I would be happy to contribute the patch, but I am a bit at a loss as to how to write an automated test for it because (1) such a test must depend on an extension module, and (2) test failure means causing a deadlock. Any advice as to whether such a patch would be considered, or as to how best to test it would be welcome. -- components: Tkinter messages: 156619 nosy: jcbollinger priority: normal severity: normal status: open title: Tkinter single-threaded deadlock type: behavior versions: Python 2.6 ___ Python tracker <http://bugs.python.org/issue14390> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14390] Tkinter single-threaded deadlock
John Bollinger added the comment: I was already working on a standalone test, and now I have it ready. Using it I can demonstrate the issue against both the cpython trunk and against my local v2.6.6 binary distribution, therefore I have added v3.3 as an affected version. It is reasonable to suppose that all versions in between are affected as well, but I have not tested versions 2.7, 3.1, or 3.2. I attach a complete package with source and Autotools build scripts. A bit of overkill, I guess, but pretty easy to use. As is typical with the Autotools, the build system is far larger than the actual project sources (those are only 162 lines of C and 57 lines of Python, both reasonably well commented). The test should be run against a Python configured with --enable-shared --with-threads (I also used --with-pydebug), and that can be an uninstalled working copy. To build and perform the test: 1) Unpack the tarball tar xzf deadlocktest-0.2.tar.gz 2) Change to the test source directory cd deadlocktest-0.2 3) Configure the test for building ./configure [--with-python-build=/path/to/working/copy] 4) Build the test make 5) Run the test make check The test builds and runs (and fails) against both Python 2.6 and the current trunk (3.3). It passes when run against my patched versions of 2.6 and 3.3. -- versions: +Python 3.3 Added file: http://bugs.python.org/file25032/deadlocktest-0.2.tar.gz ___ Python tracker <http://bugs.python.org/issue14390> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14390] Tkinter single-threaded deadlock
John Bollinger added the comment: For what it's worth, I can convert my standalone test into a PyUnit testcase easily enough (or so it appears). I'm having trouble, however, figuring out how to get the extension it depends on built and accessible to the test, yet not installed with the normal modules. -- ___ Python tracker <http://bugs.python.org/issue14390> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14390] Tkinter single-threaded deadlock
John Bollinger added the comment: I looked at the packaging tests (thanks), but I didn't find anything useful to me. There were a couple whose names looked promising, but they turned out to be stubs. As far as I can tell, none of those tests actually invoke the system's C compiler, even indirectly. They are numerous, however, so I could have overlooked something. It occurs to me that because the extension only needs to provide one function, I could just add that to _tkinter. That would ease testing without adding anything to the *public* API, but it seems a bit smelly to me because the point is that a user extension can trigger the bug. Also, the added function would be accessible to programs that choose to ignore privacy convention. Also, I am assuming that tests only need to be runnable by developers and build automatons -- i.e. someone who can and did build Python from source. If they need also to be runnable by end users then a compiled version of any extension the tests depend upon needs to be included in binary distributions. -- ___ Python tracker <http://bugs.python.org/issue14390> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15037] test_curses fails with OverflowError
New submission from John Bollinger : I encountered this test failure while attempting to verify a patch for a separate issue, and I found that it occurs with the unmodified source on the default branch: LD_LIBRARY_PATH="$PWD" ./python -bb -Wd -m test -r -w -uall -v test_curses == CPython 3.3.0a4+ (default:4aeb5b9b62d7, Jun 8 2012, 10:23:35) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] == Linux-2.6.32-220.4.1.el6.x86_64-x86_64-with-centos-6.2-Final little-endian == /home/jbolling/cpython/build/test_python_26873 Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=2, quiet=0, hash_randomization=1) Using random seed 3072318 [1/1] test_curses test test_curses crashed -- Traceback (most recent call last): File "/home/jbolling/cpython/Lib/test/regrtest.py", line 1237, in runtest_inner test_runner() File "/home/jbolling/cpython/Lib/test/test_curses.py", line 338, in test_main main(stdscr) File "/home/jbolling/cpython/Lib/test/test_curses.py", line 324, in main test_unget_wch(stdscr) File "/home/jbolling/cpython/Lib/test/test_curses.py", line 283, in test_unget_wch read = chr(read) OverflowError: signed integer is greater than maximum 1 test failed: test_curses Re-running failed tests in verbose mode Re-running test 'test_curses' in verbose mode test test_curses crashed -- Traceback (most recent call last): File "/home/jbolling/cpython/Lib/test/regrtest.py", line 1237, in runtest_inner test_runner() File "/home/jbolling/cpython/Lib/test/test_curses.py", line 338, in test_main main(stdscr) File "/home/jbolling/cpython/Lib/test/test_curses.py", line 324, in main test_unget_wch(stdscr) File "/home/jbolling/cpython/Lib/test/test_curses.py", line 283, in test_unget_wch read = chr(read) OverflowError: signed integer is greater than maximum [123272 refs] Python was built and the tests run on CentOS 6.2 / x86_64, using the platform's standard tool chain, configured with "--enable-shared --with-threads --with-pydebug". All other tests pass for me, although test_builtin failed when run as part of the whole suite but passed when run by itself. For what it's worth, it looks like that particular message is emitted in exactly one place: Python/getargs.c:661 (function convertsimple()), which in this case I guess is being called indirectly from Python/bltinmodule.c:526 (function builtin_chr()). It's not obvious to me why that would be failing. -- components: Tests messages: 162532 nosy: jcbollinger priority: normal severity: normal status: open title: test_curses fails with OverflowError type: behavior versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue15037> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14390] Tkinter single-threaded deadlock
John Bollinger added the comment: I attach a patch fixing the issue and providing a test and docs. The fix is substantially as I described earlier: a thread that holds the Tcl lock is permitted to acquire it logically any number of times, but physically attempts to acquire it only if it doesn't already hold it. A thread-local counter ensures that the lock is logically released the same number of times it has been acquired before it is physically released. The external API is unchanged, and even source changes are minimized to the greatest extent possible. If this fix ultimately is accepted then I hope it can also be back-ported to 2.7. -- keywords: +patch Added file: http://bugs.python.org/file25866/reentrant-tkinter.patch ___ Python tracker <http://bugs.python.org/issue14390> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14390] Tkinter single-threaded deadlock
John Bollinger added the comment: Yes, I have basically made tkinter's Tcl lock into an Rlock. With respect to Python3's Rlock implementation, though, are you talking about what I see in Modules/_threadmodule.c? Even if it would be acceptable to make the tkinter module depend on the thread module (not clear), I don't think I can easily use that because it looks like all the relevant functions are static, in typical extension module fashion. In other words, it provides only a Python API, not a C API. Moreover, the current implementation can easily be backported to Python 2, but that would not be true of an implementation based on the thread module's Rlock. If you would nevertheless prefer that the thread module's Rlock be used then I would appreciate technical suggestions for how to overcome the lack of a C API. I am content to comply with the PSF copyright marking policy. Is it documented somewhere? My understanding is that my copyright does not depend in any way on marking the work -- at least in the US -- but there are other reasons to prefer to mark. Anyway, show me the policy or else just confirm that it is to not mark in cases such as this, and I will remove it. Tkinter threading and re-entrancy issues have been somewhat of a sore spot for a very long time, so I think this change is worth calling out. Nevertheless, if Raymund disagrees then so be it. Thanks -- ___ Python tracker <http://bugs.python.org/issue14390> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15037] test_curses fails with OverflowError
John Bollinger added the comment: The system on which I encountered the test failure uses ncurses 5.7, so that's consistent with the theory that the test is tickling an ncurses bug. I'll have a look at testing with ncurses 5.8, but it is not available from RedHat or CentOS (and it never will be for the current and past versions of those systems), so that's not a good solution for most users. On the other hand, it's not clear to me how serious is the bug revealed by the test failures, nor whether there is any viable workaround on the Python side. -- ___ Python tracker <http://bugs.python.org/issue15037> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15037] test_curses fails with OverflowError
John Bollinger added the comment: Clarification: "so that's not a good solution for most users" ... of RedHat-family distros, version 6.2 and earlier. In fact, it looks like RedHat is sticking with its current version of ncurses for RHEL 6.3, too, so no help is coming from that direction any time soon. -- ___ Python tracker <http://bugs.python.org/issue15037> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15037] test_curses fails with OverflowError
John Bollinger added the comment: Ok, I confirm that the test passes after the system's ncurses library is upgraded to ncurses 5.8, and fails again when ncurses is downgraded back to version 5.7. -- ___ Python tracker <http://bugs.python.org/issue15037> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com