Bug#903514: Deadlock in _dl_close join-ing threads accessing TLS (was Re: gimp won't launch)

2019-03-31 Thread Aurelien Jarno
This bug is very likely a bug present in old glibc versions. It has been
brought to light when enabling TLS support in openblas and not by a new
glibc version.

Right now the bug has been workarounded by disabling TLS support in
openblas. The way to handle this bug is to write a small testcase that
can be forwarded upstream. It's not an easy task though.

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://www.aurel32.net


signature.asc
Description: PGP signature


Bug#914999: [libc6] Locking problems into libc6

2019-03-31 Thread Julien Cristau
Control: severity -1 important

On Thu, Nov 29, 2018 at 01:58:47PM +0200, Roman Savochenko wrote:
> Package: libc6
> Version: 2.24
> Severity: critical
> 
This is not a critical bug (it's not even clear from the report at this
point what the problem is, and it doesn't seem too widespread);
downgrading.

Cheers,
Julien



Processed: Re: Bug#914999: [libc6] Locking problems into libc6

2019-03-31 Thread Debian Bug Tracking System
Processing control commands:

> severity -1 important
Bug #914999 [libc6] [libc6] Locking problems into libc6
Severity set to 'important' from 'critical'

-- 
914999: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914999
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Re: Buggy gettext() after switching locale (#924657)

2019-03-31 Thread Niko Tyni
On Sun, Mar 31, 2019 at 01:56:15PM +0200, intrigeri wrote:
 
> >From Iain's report I understand that since Perl 5.28, gettext() calls
> are cached, so after switching the locale, in order to get correct
> results, one needs to invalidate the cache somehow, e.g. by calling
> bindtextdomain() and then textdomain().

FWIW I looked into this a bit, and indeed glibc has a cache of
already loaded translations that gets invalidated (by incrementing
_nl_msg_cat_cntr) in setlocale(3), bindtextdomain(3) and textdomain(3)
but not uselocale(3).

Starting with Perl 5.28, Perl uses POSIX 2008 thread-safe locales, so
it calls uselocale(3) underneath when the Perl side POSIX::setlocale()
function is invoked.

The proposed fix/workaround seems fine to me, though I wonder if glibc
should invalidate the cache in uselocale(3) as well. Copying the
glibc maintainers. Any opinion on this?
-- 
Niko Tyni   nt...@debian.org



Bug#903514: Deadlock in _dl_close join-ing threads accessing TLS (was Re: gimp won't launch)

2019-03-31 Thread Alexis Murzeau
Le 31/03/2019 à 15:19, Aurelien Jarno a écrit :
> This bug is very likely a bug present in old glibc versions. It has been
> brought to light when enabling TLS support in openblas and not by a new
> glibc version.
> 
> Right now the bug has been workarounded by disabling TLS support in
> openblas. The way to handle this bug is to write a small testcase that
> can be forwarded upstream. It's not an easy task though.
> 

Hi,

I've made a test case here [0].
I've not tested it against latest glibc commit.
But it does reproduce the deadlock with glibc 2.28 on Linux.

To run the test case, do this:
```
gcc test_compiler_tls.c -o test_compiler_tls -ldl -g -pthread
gcc test_compiler_tls_lib.c -shared -o test_compiler_tls_lib.so \
 -g -pthread -fPIC
./test_compiler_tls ./test_compiler_tls_lib &
gdb --pid $! -ex 'thr a a bt'
```

This reproduce the deadlock that I've found in openblas:
1- The test_thread open the library which call its constructor
2- The library's constructor create a thread
   `thread_that_use_tls_after_sleep`
3- The thread `thread_that_use_tls_after_sleep` sleep for 100ms (this
   needs to be enough so dl_close is called before the sleep ends)
3- The test_thread close the library with dl_close
4- dl_close lock `dl_load_lock` and call the library's destructor
5- The library's destructor wait `thread_that_use_tls_after_sleep` to
   finish
6- The `thread_that_use_tls_after_sleep` thread try to read the TLS
   variable which cause a call to `__tls_get_addr`
7- `__tls_get_addr` cause a deadlock in `tls_get_addr_tail` trying to
   lock the same `dl_load_lock` as dl_close does
8- Nothing happen because dl_close thread is waiting for the
   `thread_that_use_tls_after_sleep` thread to finish which having the
   lock and the latter thread try to lock the same lock as dl_close and
   so never exit.

See [1] for the stacktrace.

Thread 3 is the library's thread created in its constructor and joined
in its destructor.
Thread 2 is the thread that does dl_open and dl_close.
Thread 1 is a "monitoring" thread to implement a timeout of 10s (useful
if this tests need to run on a CI system)

Where dl_close lock the `dl_load_lock`: [2]
Where tls_get_addr_tail lock the `dl_load_lock`: [3]

[0]: https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7
[1]:
https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7#file-gdb_stacktrace-txt
[2]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-close.c#L812
[3]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-tls.c#L761

-- 
Alexis Murzeau
PGP: B7E6 0EBB 9293 7B06 BDBC  2787 E7BD 1904 F480 937F



signature.asc
Description: OpenPGP digital signature


Processed: Found in glibc 2.28-5

2019-03-31 Thread Debian Bug Tracking System
Processing commands for cont...@bugs.debian.org:

> found 903514 2.28-5
Bug #903514 [src:glibc] Deadlock in _dl_close join-ing threads accessing TLS
Bug #904544 [src:glibc] gimp hangs on startup if libopenblas-base is installed
Bug #906152 [src:glibc] libopenblas-base: version 0.3.2+ds-1 makes gimp hang 
indefinitely
Bug #906516 [src:glibc] gimp: Segfault with libopenblas-base
Marked as found in versions glibc/2.28-5.
Marked as found in versions glibc/2.28-5.
Marked as found in versions glibc/2.28-5.
Marked as found in versions glibc/2.28-5.
> thanks
Stopping processing here.

Please contact me if you need assistance.
-- 
903514: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=903514
904544: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904544
906152: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906152
906516: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=906516
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#903514: Deadlock in _dl_close join-ing threads accessing TLS (was Re: gimp won't launch)

2019-03-31 Thread Alexis Murzeau
Le 31/03/2019 à 22:53, Alexis Murzeau a écrit :
> Le 31/03/2019 à 15:19, Aurelien Jarno a écrit :
>> This bug is very likely a bug present in old glibc versions. It has been
>> brought to light when enabling TLS support in openblas and not by a new
>> glibc version.
>>
>> Right now the bug has been workarounded by disabling TLS support in
>> openblas. The way to handle this bug is to write a small testcase that
>> can be forwarded upstream. It's not an easy task though.
>>
> 
> Hi,
> 
> I've made a test case here [0].
> I've not tested it against latest glibc commit.
> But it does reproduce the deadlock with glibc 2.28 on Linux.
> 
> To run the test case, do this:
> ```
> gcc test_compiler_tls.c -o test_compiler_tls -ldl -g -pthread
> gcc test_compiler_tls_lib.c -shared -o test_compiler_tls_lib.so \
>  -g -pthread -fPIC
> ./test_compiler_tls ./test_compiler_tls_lib &
> gdb --pid $! -ex 'thr a a bt'
> ```
> 
> This reproduce the deadlock that I've found in openblas:
> 1- The test_thread open the library which call its constructor
> 2- The library's constructor create a thread
>`thread_that_use_tls_after_sleep`
> 3- The thread `thread_that_use_tls_after_sleep` sleep for 100ms (this
>needs to be enough so dl_close is called before the sleep ends)
> 3- The test_thread close the library with dl_close
> 4- dl_close lock `dl_load_lock` and call the library's destructor
> 5- The library's destructor wait `thread_that_use_tls_after_sleep` to
>finish
> 6- The `thread_that_use_tls_after_sleep` thread try to read the TLS
>variable which cause a call to `__tls_get_addr`
> 7- `__tls_get_addr` cause a deadlock in `tls_get_addr_tail` trying to
>lock the same `dl_load_lock` as dl_close does
> 8- Nothing happen because dl_close thread is waiting for the
>`thread_that_use_tls_after_sleep` thread to finish which having the
>lock and the latter thread try to lock the same lock as dl_close and
>so never exit.
> 
> See [1] for the stacktrace.
> 
> Thread 3 is the library's thread created in its constructor and joined
> in its destructor.
> Thread 2 is the thread that does dl_open and dl_close.
> Thread 1 is a "monitoring" thread to implement a timeout of 10s (useful
> if this tests need to run on a CI system)
> 
> Where dl_close lock the `dl_load_lock`: [2]
> Where tls_get_addr_tail lock the `dl_load_lock`: [3]
> 
> [0]: https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7
> [1]:
> https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7#file-gdb_stacktrace-txt
> [2]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-close.c#L812
> [3]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-tls.c#L761
> 

Related links:
https://bugzilla.redhat.com/show_bug.cgi?id=1409899
https://sourceware.org/bugzilla/show_bug.cgi?id=2377


Actually, the hang is caused by a C++ here, but that's the same deadlock
(the C++ exception require the `dl_load_lock´ lock).

It seems from the first link that using thread stuff in constructor and
destructor is risky and not well supported and that applications should
just avoid doing this.

I didn't find a really related bug in sourceware bugzilla, maybe we
should forward our bug to them ?

-- 
Alexis Murzeau
PGP: B7E6 0EBB 9293 7B06 BDBC  2787 E7BD 1904 F480 937F



signature.asc
Description: OpenPGP digital signature