Bug#903514: Deadlock in _dl_close join-ing threads accessing TLS (was Re: gimp won't launch)
This bug is very likely a bug present in old glibc versions. It has been brought to light when enabling TLS support in openblas and not by a new glibc version. Right now the bug has been workarounded by disabling TLS support in openblas. The way to handle this bug is to write a small testcase that can be forwarded upstream. It's not an easy task though. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net signature.asc Description: PGP signature
Bug#914999: [libc6] Locking problems into libc6
Control: severity -1 important On Thu, Nov 29, 2018 at 01:58:47PM +0200, Roman Savochenko wrote: > Package: libc6 > Version: 2.24 > Severity: critical > This is not a critical bug (it's not even clear from the report at this point what the problem is, and it doesn't seem too widespread); downgrading. Cheers, Julien
Processed: Re: Bug#914999: [libc6] Locking problems into libc6
Processing control commands: > severity -1 important Bug #914999 [libc6] [libc6] Locking problems into libc6 Severity set to 'important' from 'critical' -- 914999: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914999 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Re: Buggy gettext() after switching locale (#924657)
On Sun, Mar 31, 2019 at 01:56:15PM +0200, intrigeri wrote: > >From Iain's report I understand that since Perl 5.28, gettext() calls > are cached, so after switching the locale, in order to get correct > results, one needs to invalidate the cache somehow, e.g. by calling > bindtextdomain() and then textdomain(). FWIW I looked into this a bit, and indeed glibc has a cache of already loaded translations that gets invalidated (by incrementing _nl_msg_cat_cntr) in setlocale(3), bindtextdomain(3) and textdomain(3) but not uselocale(3). Starting with Perl 5.28, Perl uses POSIX 2008 thread-safe locales, so it calls uselocale(3) underneath when the Perl side POSIX::setlocale() function is invoked. The proposed fix/workaround seems fine to me, though I wonder if glibc should invalidate the cache in uselocale(3) as well. Copying the glibc maintainers. Any opinion on this? -- Niko Tyni nt...@debian.org
Bug#903514: Deadlock in _dl_close join-ing threads accessing TLS (was Re: gimp won't launch)
Le 31/03/2019 à 15:19, Aurelien Jarno a écrit : > This bug is very likely a bug present in old glibc versions. It has been > brought to light when enabling TLS support in openblas and not by a new > glibc version. > > Right now the bug has been workarounded by disabling TLS support in > openblas. The way to handle this bug is to write a small testcase that > can be forwarded upstream. It's not an easy task though. > Hi, I've made a test case here [0]. I've not tested it against latest glibc commit. But it does reproduce the deadlock with glibc 2.28 on Linux. To run the test case, do this: ``` gcc test_compiler_tls.c -o test_compiler_tls -ldl -g -pthread gcc test_compiler_tls_lib.c -shared -o test_compiler_tls_lib.so \ -g -pthread -fPIC ./test_compiler_tls ./test_compiler_tls_lib & gdb --pid $! -ex 'thr a a bt' ``` This reproduce the deadlock that I've found in openblas: 1- The test_thread open the library which call its constructor 2- The library's constructor create a thread `thread_that_use_tls_after_sleep` 3- The thread `thread_that_use_tls_after_sleep` sleep for 100ms (this needs to be enough so dl_close is called before the sleep ends) 3- The test_thread close the library with dl_close 4- dl_close lock `dl_load_lock` and call the library's destructor 5- The library's destructor wait `thread_that_use_tls_after_sleep` to finish 6- The `thread_that_use_tls_after_sleep` thread try to read the TLS variable which cause a call to `__tls_get_addr` 7- `__tls_get_addr` cause a deadlock in `tls_get_addr_tail` trying to lock the same `dl_load_lock` as dl_close does 8- Nothing happen because dl_close thread is waiting for the `thread_that_use_tls_after_sleep` thread to finish which having the lock and the latter thread try to lock the same lock as dl_close and so never exit. See [1] for the stacktrace. Thread 3 is the library's thread created in its constructor and joined in its destructor. Thread 2 is the thread that does dl_open and dl_close. Thread 1 is a "monitoring" thread to implement a timeout of 10s (useful if this tests need to run on a CI system) Where dl_close lock the `dl_load_lock`: [2] Where tls_get_addr_tail lock the `dl_load_lock`: [3] [0]: https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7 [1]: https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7#file-gdb_stacktrace-txt [2]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-close.c#L812 [3]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-tls.c#L761 -- Alexis Murzeau PGP: B7E6 0EBB 9293 7B06 BDBC 2787 E7BD 1904 F480 937F signature.asc Description: OpenPGP digital signature
Processed: Found in glibc 2.28-5
Processing commands for cont...@bugs.debian.org: > found 903514 2.28-5 Bug #903514 [src:glibc] Deadlock in _dl_close join-ing threads accessing TLS Bug #904544 [src:glibc] gimp hangs on startup if libopenblas-base is installed Bug #906152 [src:glibc] libopenblas-base: version 0.3.2+ds-1 makes gimp hang indefinitely Bug #906516 [src:glibc] gimp: Segfault with libopenblas-base Marked as found in versions glibc/2.28-5. Marked as found in versions glibc/2.28-5. Marked as found in versions glibc/2.28-5. Marked as found in versions glibc/2.28-5. > thanks Stopping processing here. Please contact me if you need assistance. -- 903514: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903514 904544: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904544 906152: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906152 906516: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906516 Debian Bug Tracking System Contact ow...@bugs.debian.org with problems
Bug#903514: Deadlock in _dl_close join-ing threads accessing TLS (was Re: gimp won't launch)
Le 31/03/2019 à 22:53, Alexis Murzeau a écrit : > Le 31/03/2019 à 15:19, Aurelien Jarno a écrit : >> This bug is very likely a bug present in old glibc versions. It has been >> brought to light when enabling TLS support in openblas and not by a new >> glibc version. >> >> Right now the bug has been workarounded by disabling TLS support in >> openblas. The way to handle this bug is to write a small testcase that >> can be forwarded upstream. It's not an easy task though. >> > > Hi, > > I've made a test case here [0]. > I've not tested it against latest glibc commit. > But it does reproduce the deadlock with glibc 2.28 on Linux. > > To run the test case, do this: > ``` > gcc test_compiler_tls.c -o test_compiler_tls -ldl -g -pthread > gcc test_compiler_tls_lib.c -shared -o test_compiler_tls_lib.so \ > -g -pthread -fPIC > ./test_compiler_tls ./test_compiler_tls_lib & > gdb --pid $! -ex 'thr a a bt' > ``` > > This reproduce the deadlock that I've found in openblas: > 1- The test_thread open the library which call its constructor > 2- The library's constructor create a thread >`thread_that_use_tls_after_sleep` > 3- The thread `thread_that_use_tls_after_sleep` sleep for 100ms (this >needs to be enough so dl_close is called before the sleep ends) > 3- The test_thread close the library with dl_close > 4- dl_close lock `dl_load_lock` and call the library's destructor > 5- The library's destructor wait `thread_that_use_tls_after_sleep` to >finish > 6- The `thread_that_use_tls_after_sleep` thread try to read the TLS >variable which cause a call to `__tls_get_addr` > 7- `__tls_get_addr` cause a deadlock in `tls_get_addr_tail` trying to >lock the same `dl_load_lock` as dl_close does > 8- Nothing happen because dl_close thread is waiting for the >`thread_that_use_tls_after_sleep` thread to finish which having the >lock and the latter thread try to lock the same lock as dl_close and >so never exit. > > See [1] for the stacktrace. > > Thread 3 is the library's thread created in its constructor and joined > in its destructor. > Thread 2 is the thread that does dl_open and dl_close. > Thread 1 is a "monitoring" thread to implement a timeout of 10s (useful > if this tests need to run on a CI system) > > Where dl_close lock the `dl_load_lock`: [2] > Where tls_get_addr_tail lock the `dl_load_lock`: [3] > > [0]: https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7 > [1]: > https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7#file-gdb_stacktrace-txt > [2]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-close.c#L812 > [3]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-tls.c#L761 > Related links: https://bugzilla.redhat.com/show_bug.cgi?id=1409899 https://sourceware.org/bugzilla/show_bug.cgi?id=2377 Actually, the hang is caused by a C++ here, but that's the same deadlock (the C++ exception require the `dl_load_lock´ lock). It seems from the first link that using thread stuff in constructor and destructor is risky and not well supported and that applications should just avoid doing this. I didn't find a really related bug in sourceware bugzilla, maybe we should forward our bug to them ? -- Alexis Murzeau PGP: B7E6 0EBB 9293 7B06 BDBC 2787 E7BD 1904 F480 937F signature.asc Description: OpenPGP digital signature