Hi Johannes,

> I'd recommend not using such a workaround:
>
> This means getTLSRange will always return an empty range, but the GC uses
> this to scan TLS memory. This means a GC collection can delete objects
> which are still pointed to from TLS. This leads to hard to debug errors,
> and if I remember correctly, the testsuite will not catch these errors. I
> think we have code in phobos though which references objects only from TLS
> and this will break after a GC collection.

I fully admit to have been wary about such an approach myself, but was
astonished how far it seemed to get me.

I suspect the two testsuite regressions (compared to a build with
dlpi_tls_modid present) I mentioned are exactly of the kind you mention:

e.g. the gdc.test/runnable/testaa.d failures are like this

core.exception.rangeer...@gdc.test/runnable/testaa.d(410): Range violation
----------------
/vol/gcc/src/hg/trunk/local/libphobos/libdruntime/core/exception.d:496 
onRangeError [0x80f0d2c]
/vol/gcc/src/hg/trunk/local/libphobos/libdruntime/core/exception.d:672 
_d_arraybounds [0x80f132f]
??:? void testaa.test15() [0x80d7ae4]
??:? _Dmain [0x80dd3fc]
before test 1

and gdc.test/runnable/xtest55.d fails like so:

core.exception.asserter...@gdc.test/runnable/xtest55.d(19): Assertion failure
----------------
/vol/gcc/src/hg/trunk/local/libphobos/libdruntime/core/exception.d:441 
onAssertError [0x7fff55dd3b56]
??:? _Dmain [0x418959]
7FFFBEB00000    7FFFBEB00000

It's a small set admittedly (but there are the libphobos failures as
well), but a compiler that leaves its users with a feeling of
unreliablity is probably worse than none at all.

Just for the record, I saw the same regressions on Linux/x86_64 when I
accidentally didn't define _GNU_SOURCE in the configure test for
dlpi_tls_modid, producing an equivalent configuration.  So this isn't
Solaris-specific in any way.

> I'm not sure what's a good solution here. EmuTLS has got the same problem,
> but I'll post a RFC patch next weekend which would allow to scan the emuTLS
> memory. If we somehow make that work, I'd recommend to use emuTLS instead
> of native TLS if there's no way to scan the native TLS*.

The problem here is that we'd probably need to build gcc twice in this
case: once with native TLS for all non-D languages, and a second time
with --disable-tls for D.  AFAICS TARGET_HAVE_TLS needs to be a
compile-time constant and cannot depend on the language being compiled
for.

> FYI Martin Nowak(in CC) wrote most of the original code for rt.sections so
> he's the expert we'd have to ask.
>
> * Maybe we could implement a more runtime-independent approach to scan
> native TLS?
> 1) We somehow need to bracket the TLS section (it would have to be
>    per-shared-library though, we basically need thread-local, hidden
>    __start_tls and __stop_tls symbols).
> 2) We need to emit a hidden _dso_scan_tls function into each D library.
>    A pointer to  this DSO specific function then has to be passed in
>    CompilerDSOData to _d_dso_registry.
> 3) tlsRange has to forward to the correct, DSO specific _dso_scan_tls.
>
> 2 and 3 are easy but I'm not sure if we can do 1.

Right: I suspect 1 would we way more difficult than the
__start_minfo/__stop_minfo stuff.

I failed to mention another approach in my patch submission, though I
alluded to it in PR d/88150: the ldc fork of libdruntime

        https://github.com/ldc-developers/druntime

has in src/rt/sections_ldc.d an implementation of getTLSRange for
Illumos/Solaris without dlpi_tls_modid.  I managed to adapt it to
sections_elf_shared.d, but apart from the fact that it uses undocumented
libc internals (which probably don't change between Solaris 10 and 11.4,
so that shouldn't be too bad) that implementation only gets you the TLS
range for the main executable, so isn't very useful AFAICS.

        Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

Reply via email to