Hello,

we're running into a problem related to use of initial-exec access to
TLS variables in dynamically-loaded libraries.  Now, in general, this
is actually not supported.  However, there seems to an "inofficial"
extension that allows selected system libraries to use small amounts
of static TLS space to allow critical variables to be defined to use
the initial-exec model even in dynamically-loaded libraries.

One example of a system library that does this is libgomp, the OpenMP
support library provided with GCC.  Here's an email thread from the
gcc mailing lists debating the use of the initial-exec model:

[gomp] Avoid -Wl,-z,nodlopen (PR libgomp/28482)
https://gcc.gnu.org/ml/gcc-patches/2007-05/msg00097.html

The idea why this is supposed to work is that glibc/ld.so will always
allocate a small amount of surplus static TLS data space at startup.
As long as the total amount of initial-exec TLS variables defined in
dynamically-loaded libraries fits into that extra space, everything
is supposed to work out fine.  This could be ensured by allowing
only certain defined system libraries to use this extension.

However, in fact there is a *second* restriction, which may cause
loading a library requiring static TLS to fail, *even if* there
still is enough surplus space.  This is due to the following check
in dl-open.c:dl_open_worker:

          /* For static TLS we have to allocate the memory here and
             now.  This includes allocating memory in the DTV.  But we
             cannot change any DTV other than our own.  So, if we
             cannot guarantee that there is room in the DTV we don't
             even try it and fail the load.

             XXX We could track the minimum DTV slots allocated in
             all threads.  */
          if (! RTLD_SINGLE_THREAD_P && imap->l_tls_modid > DTV_SURPLUS)
            _dl_signal_error (0, "dlopen", NULL, N_("\
cannot load any more object with static TLS"));

This is a seriously problematic condition for the use case described
above.  There is no reasonable way a system library can ensure that,
when it is loaded via dlopen, it gets assigned a module ID not larger
than DTV_SURPLUS (which currently equals 14).

Specifically, we've had a bug report from a major ISV that one of
their large applications fails to load a plugin via dlopen with
the above error message, which turned out to be because:
- the plugin uses OpenMP and is thus implicitly linked against libgomp
- the main application does not use libgomp, so it gets loaded at dlopen
- at this point, some 150 libraries are already in use
- many of those libraries define (regular!) TLS variables

Therefore, the TLS module ID of the (indirectly loaded) libgomp ends
up being larger than 14, and the dlopen fails.  It doesn't seem to be
the case that the ISV is doing anything "wrong" here; the problem is
caused solely by the interaction of glibc and libgomp.

It seems to me that something ought to be fixed here.  Either the use
of initial-exec variables simply isn't reliably supportable, but then
not even system libraries like libgomp should use it.  Or else, glibc
*wants* to support that use case, but then it should do so in a way
that reliably works as long as system libraries adhere to conditions
that are in their power to implement.

Thinking along the latter lines, it seems the dl_open_worker check
may be overly conservative:

            For static TLS we have to allocate the memory here and
            now.  This includes allocating memory in the DTV.

It is not obvious to me that this second sentence is actually true.

It *is* true that *given the current implementation*, we would fail
if the DTV were not allocated.  This is because init_one_static_tls
(in nptl/allocatestack.c) does:

  /* Fill in the DTV slot so that a later LD/GD access will find it.  */
  dtv[map->l_tls_modid].pointer.val = dest;
  dtv[map->l_tls_modid].pointer.is_static = true;

which would simply crash if the DTV were not allocated.

However, I'm not sure why we have to do that at this point.  Variables
accessed via the initial-exec model do not actually use the DTV, since
the linker resolves the offsets in the static TLS block directly as
offsets relative to the thread pointer, without using the DTV.

Of course, if such a variable were to be *also* accessed via a normal
general-dynamic (or local-dynamic) access, *then* we'd need the DTV.
But at this point, the __tls_get_addr routine would get involved,
which would have the chance to set up the DTV entry on the fly, and
(re-)allocate DTV space as needed.  It's just that the current
implementation of __tls_get_addr implicitly assumes it is never
called for static TLS modules, and would (wrongly) also allocate the
TLS data area.

If __tls_get_addr were changed to also work on static TLS modules
(i.e. only allocate the DTV and have it point to the pre-allocated
static TLS data area in such cases), then we wouldn't have to init
the DTV in init_one_static_tls, and then we could do without the
dl_open_worker check.  Does this sound reasonable?

Bye,
Ulrich

P.S.: Appended is a small test case that shows the issue.  Note that
just two libraries using TLS suffice to trigger the problem, because
module IDs are not even reliably re-used after a dlclose ...

Makefile
========

all: module1.so module2.so main

clean:
        rm -f module.so module1.so module2.so main

module1.so: module.c
        gcc -g -Wall -DMODULE=1 -fpic -shared -o module1.so module.c

module2.so: module.c
        gcc -g -Wall -DMODULE=2 -fpic -shared -o module2.so module.c

main: main.c
        gcc -g -Wall -D_GNU_SOURCE -o main main.c -ldl -lpthread

main.c
======

#include <stdio.h>
#include <dlfcn.h>
#include <stdlib.h>
#include <pthread.h>

pthread_t thread_id;

void *thread_start (void *arg)
{
  printf ("Thread started\n");
  for (;;)
    ;
}

void run_thread (void)
{
  pthread_create(&thread_id, NULL, &thread_start, NULL);
}

void *test (const char *name)
{
  void *handle, *func;
  size_t modid;

  handle = dlopen (name, RTLD_NOW);
  if (!handle)
    {
      printf ("Cannot open %s\n", name);
      exit (1);
    }

  func = dlsym (handle, "func");
  if (!func)
    {
      printf ("Cannot find func\n");
      exit (1);
    }

  ((void (*)(void))func)();

  if (dlinfo(handle, RTLD_DI_TLS_MODID, &modid))
    {
      printf ("Cannot find TLS module ID\n");
      exit (1);
    }

  printf ("Module ID: %ld\n", (long) modid);

  return handle;
}

int main (void)
{
  void *m1, *m2;
  int i;

  run_thread ();

  m1 = test ("./module1.so");
  m2 = test ("./module2.so");

  for (i = 0; i < 100; i++)
    {
      dlclose (m1);
      m1 = test ("./module1.so");
      dlclose (m2);
      m2 = test ("./module2.so");
    }

  dlclose (m1);
  dlclose (m2);
  return 0;
}


module.c
========

#include <stdio.h>

__thread int x __attribute__ ((tls_model ("initial-exec")));

void func (void)
{
  printf ("Module %d TLS variable is: %d\n", MODULE, x);
}


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com

Reply via email to