Hello, we're running into a problem related to use of initial-exec access to TLS variables in dynamically-loaded libraries. Now, in general, this is actually not supported. However, there seems to an "inofficial" extension that allows selected system libraries to use small amounts of static TLS space to allow critical variables to be defined to use the initial-exec model even in dynamically-loaded libraries.
One example of a system library that does this is libgomp, the OpenMP support library provided with GCC. Here's an email thread from the gcc mailing lists debating the use of the initial-exec model: [gomp] Avoid -Wl,-z,nodlopen (PR libgomp/28482) https://gcc.gnu.org/ml/gcc-patches/2007-05/msg00097.html The idea why this is supposed to work is that glibc/ld.so will always allocate a small amount of surplus static TLS data space at startup. As long as the total amount of initial-exec TLS variables defined in dynamically-loaded libraries fits into that extra space, everything is supposed to work out fine. This could be ensured by allowing only certain defined system libraries to use this extension. However, in fact there is a *second* restriction, which may cause loading a library requiring static TLS to fail, *even if* there still is enough surplus space. This is due to the following check in dl-open.c:dl_open_worker: /* For static TLS we have to allocate the memory here and now. This includes allocating memory in the DTV. But we cannot change any DTV other than our own. So, if we cannot guarantee that there is room in the DTV we don't even try it and fail the load. XXX We could track the minimum DTV slots allocated in all threads. */ if (! RTLD_SINGLE_THREAD_P && imap->l_tls_modid > DTV_SURPLUS) _dl_signal_error (0, "dlopen", NULL, N_("\ cannot load any more object with static TLS")); This is a seriously problematic condition for the use case described above. There is no reasonable way a system library can ensure that, when it is loaded via dlopen, it gets assigned a module ID not larger than DTV_SURPLUS (which currently equals 14). Specifically, we've had a bug report from a major ISV that one of their large applications fails to load a plugin via dlopen with the above error message, which turned out to be because: - the plugin uses OpenMP and is thus implicitly linked against libgomp - the main application does not use libgomp, so it gets loaded at dlopen - at this point, some 150 libraries are already in use - many of those libraries define (regular!) TLS variables Therefore, the TLS module ID of the (indirectly loaded) libgomp ends up being larger than 14, and the dlopen fails. It doesn't seem to be the case that the ISV is doing anything "wrong" here; the problem is caused solely by the interaction of glibc and libgomp. It seems to me that something ought to be fixed here. Either the use of initial-exec variables simply isn't reliably supportable, but then not even system libraries like libgomp should use it. Or else, glibc *wants* to support that use case, but then it should do so in a way that reliably works as long as system libraries adhere to conditions that are in their power to implement. Thinking along the latter lines, it seems the dl_open_worker check may be overly conservative: For static TLS we have to allocate the memory here and now. This includes allocating memory in the DTV. It is not obvious to me that this second sentence is actually true. It *is* true that *given the current implementation*, we would fail if the DTV were not allocated. This is because init_one_static_tls (in nptl/allocatestack.c) does: /* Fill in the DTV slot so that a later LD/GD access will find it. */ dtv[map->l_tls_modid].pointer.val = dest; dtv[map->l_tls_modid].pointer.is_static = true; which would simply crash if the DTV were not allocated. However, I'm not sure why we have to do that at this point. Variables accessed via the initial-exec model do not actually use the DTV, since the linker resolves the offsets in the static TLS block directly as offsets relative to the thread pointer, without using the DTV. Of course, if such a variable were to be *also* accessed via a normal general-dynamic (or local-dynamic) access, *then* we'd need the DTV. But at this point, the __tls_get_addr routine would get involved, which would have the chance to set up the DTV entry on the fly, and (re-)allocate DTV space as needed. It's just that the current implementation of __tls_get_addr implicitly assumes it is never called for static TLS modules, and would (wrongly) also allocate the TLS data area. If __tls_get_addr were changed to also work on static TLS modules (i.e. only allocate the DTV and have it point to the pre-allocated static TLS data area in such cases), then we wouldn't have to init the DTV in init_one_static_tls, and then we could do without the dl_open_worker check. Does this sound reasonable? Bye, Ulrich P.S.: Appended is a small test case that shows the issue. Note that just two libraries using TLS suffice to trigger the problem, because module IDs are not even reliably re-used after a dlclose ... Makefile ======== all: module1.so module2.so main clean: rm -f module.so module1.so module2.so main module1.so: module.c gcc -g -Wall -DMODULE=1 -fpic -shared -o module1.so module.c module2.so: module.c gcc -g -Wall -DMODULE=2 -fpic -shared -o module2.so module.c main: main.c gcc -g -Wall -D_GNU_SOURCE -o main main.c -ldl -lpthread main.c ====== #include <stdio.h> #include <dlfcn.h> #include <stdlib.h> #include <pthread.h> pthread_t thread_id; void *thread_start (void *arg) { printf ("Thread started\n"); for (;;) ; } void run_thread (void) { pthread_create(&thread_id, NULL, &thread_start, NULL); } void *test (const char *name) { void *handle, *func; size_t modid; handle = dlopen (name, RTLD_NOW); if (!handle) { printf ("Cannot open %s\n", name); exit (1); } func = dlsym (handle, "func"); if (!func) { printf ("Cannot find func\n"); exit (1); } ((void (*)(void))func)(); if (dlinfo(handle, RTLD_DI_TLS_MODID, &modid)) { printf ("Cannot find TLS module ID\n"); exit (1); } printf ("Module ID: %ld\n", (long) modid); return handle; } int main (void) { void *m1, *m2; int i; run_thread (); m1 = test ("./module1.so"); m2 = test ("./module2.so"); for (i = 0; i < 100; i++) { dlclose (m1); m1 = test ("./module1.so"); dlclose (m2); m2 = test ("./module2.so"); } dlclose (m1); dlclose (m2); return 0; } module.c ======== #include <stdio.h> __thread int x __attribute__ ((tls_model ("initial-exec"))); void func (void) { printf ("Module %d TLS variable is: %d\n", MODULE, x); } -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com