On 1/15/24 08:55, Adhemerval Zanella Netto wrote: > > > On 15/01/24 09:46, Szabolcs Nagy wrote: >> The 01/13/2024 13:49, Florian Weimer wrote: >>> This commit >>> >>> commit 8abddb187b33480d8827f44ec655f45734a1749d >>> Author: Andrew Burgess <andrew.burg...@embecosm.com> >>> Date: Sat Aug 5 14:31:06 2023 +0200 >>> >>> libgcc: support heap-based trampolines >>> >>> Add support for heap-based trampolines on x86_64-linux, aarch64-linux, >>> and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and >>> __builtin_nested_func_ptr_deleted functions for these targets. >>> >>> Co-Authored-By: Maxim Blinov <maxim.bli...@embecosm.com> >>> Co-Authored-By: Iain Sandoe <i...@sandoe.co.uk> >>> Co-Authored-By: Francois-Xavier Coudert <fxcoud...@gcc.gnu.org> >>> >>> added TLS usage to libgcc_s.so.1. The way that libgcc_s is currently >>> built, it ends up using a dynamic TLS variant on the Linux targets. >>> This means that there is no up-front TLS allocation with glibc (but >>> there would be one with musl). >>> >>> There is still a compatibility impact because glibc assigns a TLS module >>> ID upfront. This seems to be what causes the >>> ust/libc-wrapper/test_libc-wrapper test in lttng-tools to fail. We end >>> up with an infinite regress during process termination because >>> libgcc_s.so.1 has been loaded, resulting in a DTV update. When this >>> happens, the bottom of the stack looks like this: >>> >>> #4447 0x00007ffff7f288f0 in free () from >>> /lib64/liblttng-ust-libc-wrapper.so.1 >>> #4448 0x00007ffff7fdb142 in free (ptr=<optimized out>) >>> at ../include/rtld-malloc.h:50 >>> #4449 _dl_update_slotinfo (req_modid=3, new_gen=2) at ../elf/dl-tls.c:822 >>> #4450 0x00007ffff7fdb214 in update_get_addr (ti=0x7ffff7f2bfc0, >>> gen=<optimized out>) at ../elf/dl-tls.c:916 >>> #4451 0x00007ffff7fddccc in __tls_get_addr () >>> at ../sysdeps/x86_64/tls_get_addr.S:55 >>> #4452 0x00007ffff7f288f0 in free () from >>> /lib64/liblttng-ust-libc-wrapper.so.1 >>> #4453 0x00007ffff7fdb142 in free (ptr=<optimized out>) >>> at ../include/rtld-malloc.h:50 >>> #4454 _dl_update_slotinfo (req_modid=2, new_gen=2) at ../elf/dl-tls.c:822 >>> #4455 0x00007ffff7fdb214 in update_get_addr (ti=0x7ffff7f39fa0, >>> gen=<optimized out>) at ../elf/dl-tls.c:916 >>> #4456 0x00007ffff7fddccc in __tls_get_addr () >>> at ../sysdeps/x86_64/tls_get_addr.S:55 >>> #4457 0x00007ffff7f36113 in lttng_ust_cancelstate_disable_push () >>> from /lib64/liblttng-ust-common.so.1 >>> #4458 0x00007ffff7f4c2e8 in ust_lock_nocheck () from >>> /lib64/liblttng-ust.so.1 >>> #4459 0x00007ffff7f5175a in lttng_ust_cleanup () from >>> /lib64/liblttng-ust.so.1 >>> #4460 0x00007ffff7fca0f2 in _dl_call_fini ( >>> closure_map=closure_map@entry=0x7ffff7fbe000) at dl-call_fini.c:43 >>> #4461 0x00007ffff7fce06e in _dl_fini () at dl-fini.c:114 >>> #4462 0x00007ffff7d82fe6 in __run_exit_handlers () from /lib64/libc.so.6 >>> >>> Cc:ing <lttng-...@lists.lttng.org> for awareness. >>> >>> The issue also requires a recent glibc with changes to DTV management: >>> commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow tls >>> access after dlopen [BZ #19924]"). If I understand things correctly, >>> before this glibc change, we didn't deallocate the old DTV, so there was >>> no call to the free function. >> >> with 19924 fixed, after a dlopen or dlclose every thread updates >> its dtv on the next dynamic tls access. >> >> before that, dtv was only updated up to the generation of the >> module being accessed for a particular tls access. >> >> so hitting the free in the dtv update path is now more likely >> but the free is not new, it was there before. >> >> also note that this is unlikely to happen on aarch64 since >> tlsdesc only does dynamic tls access after a 512byte static >> tls reservation runs out. >> >>> >>> On the glibc side, we should recommend that intercepting mallocs and its >>> dependencies use initial-exec TLS because that kind of TLS does not use >>> malloc. If intercepting mallocs using dynamic TLS work at all, that's >>> totally by accident, and was in the past helped by glibc bug 19924. (I >> >> right. >> >>> don't think there is anything special about libgcc_s.so.1 that triggers >>> the test failure above, it is just an object with dynamic TLS that is >>> implicitly loaded via dlopen at the right stage of the test.) In this >>> particular case, we can also paper over the test failure in glibc by not >>> call free at all because the argument is a null pointer: >>> >>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c >>> index 7b3dd9ab60..14c71cbd06 100644 >>> --- a/elf/dl-tls.c >>> +++ b/elf/dl-tls.c >>> @@ -819,7 +819,8 @@ _dl_update_slotinfo (unsigned long int req_modid, >>> size_t new_gen) >>> dtv entry free it. Note: this is not AS-safe. */ >>> /* XXX Ideally we will at some point create a memory >>> pool. */ >>> - free (dtv[modid].pointer.to_free); >>> + if (dtv[modid].pointer.to_free != NULL) >>> + free (dtv[modid].pointer.to_free); >>> dtv[modid].pointer.val = TLS_DTV_UNALLOCATED; >>> dtv[modid].pointer.to_free = NULL; >> >> can be done, but !=NULL is more likely since we do modid reuse >> after dlclose. >> >> there is also a realloc in dtv resizing which happens when more >> than 16 modules with tls are loaded after thread creation >> (DTV_SURPLUS). >> >> i'm not sure if it's worth supporting malloc interposers that >> only work sometimes. >> > > Maybe one option would to try reinstate the async-signal-safe TLS > code to avoid malloc/free in dynamic TLS altogether. We revert it on > 2.14 release cause it broke ASAN/LSAN [1], but I think we might try > to reinstate on 2.40 and work with sanitizer project to get this sort > out.
I agree. TLS should be seen more like .bss/.data rather than something that is allocated with malloc(). If we leak memory via TLS that is a glibc bug that we can deal with, but making it easier to find glibc bugs is also a benefit to the community, but not as valuable a benefit as making TLS correctly async-signal safe. Likewise we need to discuss when the memory is allocated, regardless of which allocator is used, including allocation up-front at dlopen() time. > [1] https://sourceware.org/pipermail/libc-alpha/2014-January/047931.html > -- Cheers, Carlos.