Module Name: src
Committed By: martin
Date: Wed Aug 7 11:00:12 UTC 2024
Modified Files:
src/libexec/ld.elf_so [netbsd-10]: README.TLS tls.c
src/libexec/ld.elf_so/arch/aarch64 [netbsd-10]: rtld_start.S
src/tests/libexec/ld.elf_so [netbsd-10]: t_tls_extern.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #777):
libexec/ld.elf_so/tls.c: revision 1.15
libexec/ld.elf_so/arch/aarch64/rtld_start.S: revision 1.6
libexec/ld.elf_so/arch/aarch64/rtld_start.S: revision 1.7
tests/libexec/ld.elf_so/t_tls_extern.c: revision 1.15
tests/libexec/ld.elf_so/t_tls_extern.c: revision 1.16
libexec/ld.elf_so/README.TLS: revision 1.7
libexec/ld.elf_so/tls.c: revision 1.20
libexec/ld.elf_so/tls.c: revision 1.21
Alignment. NFCI.
ld.elf_so: Sprinkle comments and references for thread-local storage.
Maybe this will help the TLS business to be less mysterious to the
next traveller to pass by here.
Prompted by PR lib/58154.
ld.elf_so: Add comments explaining DTV allocation size.
Patch by pho@ for PR lib/58154.
tests/libexec/ld.elf_so/t_tls_extern: Test PR lib/58154.
ld.elf_so aarch64/rtld_start.S: Sprinkle comments.
No functional change intended.
Prompted by PR lib/58154.
ld.elf_so aarch64/rtld_start.S: Fix dynamic TLS fast path branch.
Bug found and patch prepared by pho@.
PR lib/58154
To generate a diff of this commit:
cvs rdiff -u -r1.5.10.1 -r1.5.10.2 src/libexec/ld.elf_so/README.TLS
cvs rdiff -u -r1.14.8.1 -r1.14.8.2 src/libexec/ld.elf_so/tls.c
cvs rdiff -u -r1.5 -r1.5.2.1 src/libexec/ld.elf_so/arch/aarch64/rtld_start.S
cvs rdiff -u -r1.12.2.2 -r1.12.2.3 src/tests/libexec/ld.elf_so/t_tls_extern.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Modified files:
Index: src/libexec/ld.elf_so/README.TLS
diff -u src/libexec/ld.elf_so/README.TLS:1.5.10.1 src/libexec/ld.elf_so/README.TLS:1.5.10.2
--- src/libexec/ld.elf_so/README.TLS:1.5.10.1 Tue Aug 1 16:34:56 2023
+++ src/libexec/ld.elf_so/README.TLS Wed Aug 7 11:00:12 2024
@@ -1,11 +1,111 @@
+Thread-local storage.
+
+Each thread has a thread control block, or TCB. The TCB is a
+variable-size structure headed by `struct tls_tcb' from <sys/tls.h>,
+with:
+
+(a) static thread-local storage for the TLS data of initial objects,
+ i.e., those loaded at startup rather than those dynamically loaded
+ by dlopen
+
+(b) a pointer to a dynamic thread vector (DTV) for the TLS data
+ pointers of objects that use global-dynamic or local-dynamic models
+ (typically shared libraries or dlopenable modules)
+
+(c) the pthread_t pointer
+
+The per-thread lwp private pointer, also sometimes called TP (thread
+pointer), managed by the _lwp_setprivate and _lwp_setprivate syscalls,
+either points at the TCB directly, or, on some architectures, points at
+
+ tp = tcb + sizeof(struct tls_tcb) + TLS_TP_OFFSET.
+
+This bias is chosen for architectures where signed displacements from
+TP enable twice the range of static TLS offsets when biased like this.
+Architectures with such a tp/tcb offset must provide
+
+void *__lwp_gettcb_fast(void);
+
+in machine/mcontext.h and must define __HAVE___LWP_GETTCB_FAST in
+machine/types.h to reflect this; otherwise they must provide
+__lwp_getprivate_fast to return the TCB pointer.
+
+Each architecture has one of two TLS variants, variant I or variant II.
+Variant I places the static thread-local storage _after_ the fixed
+content of the TCB, at increasing addresses (increasing addresses grow
+down in diagram):
+
+ +---------------+
+ | dtv pointer | tcb points here (struct tls_tcb)
+ +---------------+
+ | pthread_t |
+ +---------------+
+ | obj0 tls | obj0->tlsoffset = 0
+ | |
+ | |
+ +---------------+
+ | obj1 tls | obj1->tlsoffset = 3
+ +---------------+
+ | obj2 tls | obj2->tlsoffset = 4
+ | |
+ . .
+ . .
+ . .
+ | |
+ +---------------+
+ | objN tls | objN->tlsoffset = k
+ +---------------+
+
+Variant II places the static thread-local storage _before_ the fixed
+content of the TCB, at decreasing addresses:
+
+ +---------------+
+ | objN tls | objN->tlsoffset = k
+ +---------------+
+ | obj(N-1) tls | obj(N-1)->tlsoffset = k - 1
+ . .
+ . .
+ . .
+ | |
+ +---------------+
+ | obj2 tls | obj2->tlsoffset = 4
+ +---------------+
+ | obj1 tls | obj1->tlsoffset = 3
+ +---------------+
+ | obj0 tls | obj0->tlsoffset = 0
+ | |
+ | |
+ +---------------+
+ | tcb pointer | tcb points here (struct tls_tcb)
+ +---------------+
+ | dtv pointer |
+ +---------------+
+ | pthread_t |
+ +---------------+
+
+See [ELFTLS] Sec. 3 `Run-Time Handling of TLS', Figs 1 and 2, for
+bigger pictures including the DTV and dynamically allocated TLS blocks.
+
+Each architecture also has its own ELF ABI processor supplement with
+the architecture-specific relocations and TLS details.
+
+References:
+
+ [ELFTLS] Ulrich Drepper, `ELF Handling For Thread-Local
+ Storage', Version 0.21, 2023-08-22.
+ https://akkadia.org/drepper/tls.pdf
+ https://web.archive.org/web/20240718081934/https://akkadia.org/drepper/tls.pdf
+
Steps for adding TLS support for a new platform:
(1) Declare TLS variant in machine/types.h by defining either
__HAVE_TLS_VARIANT_I or __HAVE_TLS_VARIANT_II.
-(2) _lwp_makecontext has to set the reserved register or kernel transfer
-variable in uc_mcontext to the provided value of 'private'. See
-src/lib/libc/arch/$PLATFORM/gen/_lwp.c.
+(2) _lwp_makecontext has to set the reserved register or kernel
+transfer variable in uc_mcontext according to the provided value of
+`private'. Note that _lwp_makecontext takes tcb, not tp, as an
+argument, so make sure to adjust it if needed for the tp/tcb offset.
+See src/lib/libc/arch/$PLATFORM/gen/_lwp.c.
This is not possible on the VAX as there is no free space in ucontext_t.
This requires either a special version of _lwp_create or versioning
@@ -60,9 +160,22 @@ def->st_value - defobj->tlsoffset + rela
e.g. starting offset is counting down from the TCB.
-(6) Implement __lwp_getprivate_fast() in machine/mcontext.h and set
-__HAVE___LWP_GETPRIVATE_FAST in machine/types.h.
+(6) If there is a tp/tcb offset, implement
+
+ __lwp_gettcb_fast()
+ __lwp_settcb()
+
+in machine/mcontext.h and set
+
+ __HAVE___LWP_GETTCB_FAST
+ __HAVE___LWP_SETTCB
+
+in machine/types.h.
+
+Otherwise, implement __lwp_getprivate_fast() in machine/mcontext.h and
+set __HAVE___LWP_GETPRIVATE_FAST in machine/types.h.
-(7) Test using src/tests/lib/libc/tls. Make sure with "objdump -R" that
-t_tls_dynamic has two TPOFF relocations and h_tls_dlopen.so.1 and
-libh_tls_dynamic.so.1 have both two DTPMOD and DTPOFF relocations.
+(7) Test using src/tests/lib/libc/tls and src/tests/libexec/ld.elf_so.
+Make sure with "objdump -R" that t_tls_dynamic has two TPOFF
+relocations and h_tls_dlopen.so.1 and libh_tls_dynamic.so.1 have both
+two DTPMOD and DTPOFF relocations.
Index: src/libexec/ld.elf_so/tls.c
diff -u src/libexec/ld.elf_so/tls.c:1.14.8.1 src/libexec/ld.elf_so/tls.c:1.14.8.2
--- src/libexec/ld.elf_so/tls.c:1.14.8.1 Tue Aug 1 16:34:56 2023
+++ src/libexec/ld.elf_so/tls.c Wed Aug 7 11:00:12 2024
@@ -1,4 +1,4 @@
-/* $NetBSD: tls.c,v 1.14.8.1 2023/08/01 16:34:56 martin Exp $ */
+/* $NetBSD: tls.c,v 1.14.8.2 2024/08/07 11:00:12 martin Exp $ */
/*-
* Copyright (c) 2011 The NetBSD Foundation, Inc.
* All rights reserved.
@@ -29,7 +29,18 @@
*/
#include <sys/cdefs.h>
-__RCSID("$NetBSD: tls.c,v 1.14.8.1 2023/08/01 16:34:56 martin Exp $");
+__RCSID("$NetBSD: tls.c,v 1.14.8.2 2024/08/07 11:00:12 martin Exp $");
+
+/*
+ * Thread-local storage
+ *
+ * Reference:
+ *
+ * [ELFTLS] Ulrich Drepper, `ELF Handling For Thread-Local
+ * Storage', Version 0.21, 2023-08-22.
+ * https://akkadia.org/drepper/tls.pdf
+ * https://web.archive.org/web/20240718081934/https://akkadia.org/drepper/tls.pdf
+ */
#include <sys/param.h>
#include <sys/ucontext.h>
@@ -45,20 +56,93 @@ __RCSID("$NetBSD: tls.c,v 1.14.8.1 2023/
static struct tls_tcb *_rtld_tls_allocate_locked(void);
static void *_rtld_tls_module_allocate(struct tls_tcb *, size_t);
+/*
+ * DTV offset
+ *
+ * On some architectures (m68k, mips, or1k, powerpc, and riscv),
+ * the DTV offsets passed to __tls_get_addr have a bias relative
+ * to the start of the DTV, in order to maximize the range of TLS
+ * offsets that can be used by instruction encodings with signed
+ * displacements.
+ */
#ifndef TLS_DTV_OFFSET
#define TLS_DTV_OFFSET 0
#endif
static size_t _rtld_tls_static_space; /* Static TLS space allocated */
static size_t _rtld_tls_static_offset; /* Next offset for static TLS to use */
-size_t _rtld_tls_dtv_generation = 1;
-size_t _rtld_tls_max_index = 1;
+size_t _rtld_tls_dtv_generation = 1; /* Bumped on each load of obj w/ TLS */
+size_t _rtld_tls_max_index = 1; /* Max index into up-to-date DTV */
-#define DTV_GENERATION(dtv) ((size_t)((dtv)[0]))
-#define DTV_MAX_INDEX(dtv) ((size_t)((dtv)[-1]))
+/*
+ * DTV -- Dynamic Thread Vector
+ *
+ * The DTV is a per-thread array that maps each module with
+ * thread-local storage to a pointer into part of the thread's TCB
+ * (thread control block), or dynamically loaded TLS blocks,
+ * reserved for that module's storage.
+ *
+ * The TCB itself, struct tls_tcb, has a pointer to the DTV at
+ * tcb->tcb_dtv.
+ *
+ * The layout is:
+ *
+ * +---------------+
+ * | max index | -1 max index i for which dtv[i] is alloced
+ * +---------------+
+ * | generation | 0 void **dtv points here
+ * +---------------+
+ * | obj 1 tls ptr | 1 TLS pointer for obj w/ obj->tlsindex 1
+ * +---------------+
+ * | obj 2 tls ptr | 2 TLS pointer for obj w/ obj->tlsindex 2
+ * +---------------+
+ * .
+ * .
+ * .
+ *
+ * The values of obj->tlsindex start at 1; this way,
+ * dtv[obj->tlsindex] works, when dtv[0] is the generation. The
+ * TLS pointers go either into the static thread-local storage,
+ * for the initial objects (i.e., those loaded at startup), or
+ * into TLS blocks dynamically allocated for objects that
+ * dynamically loaded by dlopen.
+ *
+ * The generation field is a cache of the global generation number
+ * _rtld_tls_dtv_generation, which is bumped every time an object
+ * with TLS is loaded in _rtld_map_object, and cached by
+ * __tls_get_addr (via _rtld_tls_get_addr) when a newly loaded
+ * module lies outside the bounds of the current DTV.
+ *
+ * XXX Why do we keep max index and generation separately? They
+ * appear to be initialized the same, always incremented together,
+ * and always stored together.
+ *
+ * XXX Why is this not a struct?
+ *
+ * struct dtv {
+ * size_t dtv_gen;
+ * void *dtv_module[];
+ * };
+ */
+#define DTV_GENERATION(dtv) ((size_t)((dtv)[0]))
+#define DTV_MAX_INDEX(dtv) ((size_t)((dtv)[-1]))
#define SET_DTV_GENERATION(dtv, val) (dtv)[0] = (void *)(size_t)(val)
#define SET_DTV_MAX_INDEX(dtv, val) (dtv)[-1] = (void *)(size_t)(val)
+/*
+ * _rtld_tls_get_addr(tcb, idx, offset)
+ *
+ * Slow path for __tls_get_addr (see below), called to allocate
+ * TLS space if needed for the object obj with obj->tlsindex idx,
+ * at offset, which must be below obj->tlssize.
+ *
+ * This may allocate a DTV if the current one is too old, and it
+ * may allocate a dynamically loaded TLS block if there isn't one
+ * already allocated for it.
+ *
+ * XXX Why is the first argument passed as `void *tls' instead of
+ * just `struct tls_tcb *tcb'?
+ */
void *
_rtld_tls_get_addr(void *tls, size_t idx, size_t offset)
{
@@ -70,15 +154,26 @@ _rtld_tls_get_addr(void *tls, size_t idx
dtv = tcb->tcb_dtv;
+ /*
+ * If the generation number has changed, we have to allocate a
+ * new DTV.
+ *
+ * XXX Do we really? Isn't it enough to check whether idx <=
+ * DTV_MAX_INDEX(dtv)?
+ */
if (__predict_false(DTV_GENERATION(dtv) != _rtld_tls_dtv_generation)) {
size_t to_copy = DTV_MAX_INDEX(dtv);
+ /*
+ * "2 +" because the first element is the generation and
+ * the second one is the maximum index.
+ */
new_dtv = xcalloc((2 + _rtld_tls_max_index) * sizeof(*dtv));
- ++new_dtv;
- if (to_copy > _rtld_tls_max_index)
+ ++new_dtv; /* advance past DTV_MAX_INDEX */
+ if (to_copy > _rtld_tls_max_index) /* XXX How? */
to_copy = _rtld_tls_max_index;
memcpy(new_dtv + 1, dtv + 1, to_copy * sizeof(*dtv));
- xfree(dtv - 1);
+ xfree(dtv - 1); /* retreat back to DTV_MAX_INDEX */
dtv = tcb->tcb_dtv = new_dtv;
SET_DTV_MAX_INDEX(dtv, _rtld_tls_max_index);
SET_DTV_GENERATION(dtv, _rtld_tls_dtv_generation);
@@ -92,6 +187,18 @@ _rtld_tls_get_addr(void *tls, size_t idx
return (uint8_t *)dtv[idx] + offset;
}
+/*
+ * _rtld_tls_initial_allocation()
+ *
+ * Allocate the TCB (thread control block) for the initial thread,
+ * once the static TLS space usage has been determined (plus some
+ * slop to allow certain special cases like Mesa to be dlopened).
+ *
+ * This must be done _after_ all initial objects (i.e., those
+ * loaded at startup, as opposed to objects dynamically loaded by
+ * dlopen) have had TLS offsets allocated if need be by
+ * _rtld_tls_offset_allocate, and have had relocations processed.
+ */
void
_rtld_tls_initial_allocation(void)
{
@@ -114,6 +221,20 @@ _rtld_tls_initial_allocation(void)
#endif
}
+/*
+ * _rtld_tls_allocate_locked()
+ *
+ * Internal subroutine to allocate a TCB (thread control block)
+ * for the current thread.
+ *
+ * This allocates a DTV and a TCB that points to it, including
+ * static space in the TCB for the TLS of the initial objects.
+ * TLS blocks for dynamically loaded objects are allocated lazily.
+ *
+ * Caller must either be single-threaded (at startup via
+ * _rtld_tls_initial_allocation) or hold the rtld exclusive lock
+ * (via _rtld_tls_allocate).
+ */
static struct tls_tcb *
_rtld_tls_allocate_locked(void)
{
@@ -131,8 +252,12 @@ _rtld_tls_allocate_locked(void)
tcb->tcb_self = tcb;
#endif
dbg(("lwp %d tls tcb %p", _lwp_self(), tcb));
+ /*
+ * "2 +" because the first element is the generation and the second
+ * one is the maximum index.
+ */
tcb->tcb_dtv = xcalloc(sizeof(*tcb->tcb_dtv) * (2 + _rtld_tls_max_index));
- ++tcb->tcb_dtv;
+ ++tcb->tcb_dtv; /* advance past DTV_MAX_INDEX */
SET_DTV_MAX_INDEX(tcb->tcb_dtv, _rtld_tls_max_index);
SET_DTV_GENERATION(tcb->tcb_dtv, _rtld_tls_dtv_generation);
@@ -155,6 +280,14 @@ _rtld_tls_allocate_locked(void)
return tcb;
}
+/*
+ * _rtld_tls_allocate()
+ *
+ * Allocate a TCB (thread control block) for the current thread.
+ *
+ * Called by pthread_create for non-initial threads. (The initial
+ * thread's TCB is allocated by _rtld_tls_initial_allocation.)
+ */
struct tls_tcb *
_rtld_tls_allocate(void)
{
@@ -168,6 +301,14 @@ _rtld_tls_allocate(void)
return tcb;
}
+/*
+ * _rtld_tls_free(tcb)
+ *
+ * Free a TCB allocated with _rtld_tls_allocate.
+ *
+ * Frees any TLS blocks for dynamically loaded objects that tcb's
+ * DTV points to, and frees tcb's DTV, and frees tcb.
+ */
void
_rtld_tls_free(struct tls_tcb *tcb)
{
@@ -190,12 +331,27 @@ _rtld_tls_free(struct tls_tcb *tcb)
(uint8_t *)tcb->tcb_dtv[i] >= p_end)
xfree(tcb->tcb_dtv[i]);
}
- xfree(tcb->tcb_dtv - 1);
+ xfree(tcb->tcb_dtv - 1); /* retreat back to DTV_MAX_INDEX */
xfree(p);
_rtld_exclusive_exit(&mask);
}
+/*
+ * _rtld_tls_module_allocate(tcb, idx)
+ *
+ * Allocate thread-local storage in the thread with the given TCB
+ * (thread control block) for the object obj whose obj->tlsindex
+ * is idx.
+ *
+ * If obj has had space in static TLS reserved (obj->tls_static),
+ * return a pointer into that. Otherwise, allocate a TLS block,
+ * mark obj as having a TLS block allocated (obj->tls_dynamic),
+ * and return it.
+ *
+ * Called by _rtld_tls_get_addr to get the thread-local storage
+ * for an object the first time around.
+ */
static void *
_rtld_tls_module_allocate(struct tls_tcb *tcb, size_t idx)
{
@@ -228,6 +384,16 @@ _rtld_tls_module_allocate(struct tls_tcb
return p;
}
+/*
+ * _rtld_tls_offset_allocate(obj)
+ *
+ * Allocate a static thread-local storage offset for obj.
+ *
+ * Called by _rtld at startup for all initial objects. Called
+ * also by MD relocation logic, which is allowed (for Mesa) to
+ * allocate an additional 64 bytes (RTLD_STATIC_TLS_RESERVATION)
+ * of static thread-local storage in dlopened objects.
+ */
int
_rtld_tls_offset_allocate(Obj_Entry *obj)
{
@@ -284,6 +450,17 @@ _rtld_tls_offset_allocate(Obj_Entry *obj
return 0;
}
+/*
+ * _rtld_tls_offset_free(obj)
+ *
+ * Free a static thread-local storage offset for obj.
+ *
+ * Called by dlclose (via _rtld_unload_object -> _rtld_obj_free).
+ *
+ * Since static thread-local storage is normally not used by
+ * dlopened objects (with the exception of Mesa), this doesn't do
+ * anything to recycle the space right now.
+ */
void
_rtld_tls_offset_free(Obj_Entry *obj)
{
@@ -297,10 +474,33 @@ _rtld_tls_offset_free(Obj_Entry *obj)
#if defined(__HAVE_COMMON___TLS_GET_ADDR) && defined(RTLD_LOADER)
/*
- * The fast path is access to an already allocated DTV entry.
- * This checks the current limit and the entry without needing any
- * locking. Entries are only freed on dlclose() and it is an application
- * bug if code of the module is still running at that point.
+ * __tls_get_addr(tlsindex)
+ *
+ * Symbol directly called by code generated by the compiler for
+ * references thread-local storage in the general-dynamic or
+ * local-dynamic TLS models (but not initial-exec or local-exec).
+ *
+ * The argument is a pointer to
+ *
+ * struct {
+ * unsigned long int ti_module;
+ * unsigned long int ti_offset;
+ * };
+ *
+ * as in, e.g., [ELFTLS] Sec. 3.4.3. This coincides with the
+ * type size_t[2] on all architectures that use this common
+ * __tls_get_addr definition (XXX but why do we write it as
+ * size_t[2]?).
+ *
+ * ti_module, i.e., arg[0], is the obj->tlsindex assigned at
+ * load-time by _rtld_map_object, and ti_offset, i.e., arg[1], is
+ * assigned at link-time by ld(1), possibly adjusted by
+ * TLS_DTV_OFFSET.
+ *
+ * Some architectures -- specifically IA-64 -- use a different
+ * calling convention. Some architectures -- specifically i386
+ * -- also use another entry point ___tls_get_addr (that's three
+ * leading underscores) with a different calling convention.
*/
void *
__tls_get_addr(void *arg_)
@@ -316,6 +516,13 @@ __tls_get_addr(void *arg_)
dtv = tcb->tcb_dtv;
+ /*
+ * Fast path: access to an already allocated DTV entry. This
+ * checks the current limit and the entry without needing any
+ * locking. Entries are only freed on dlclose() and it is an
+ * application bug if code of the module is still running at
+ * that point.
+ */
if (__predict_true(idx < DTV_MAX_INDEX(dtv) && dtv[idx] != NULL))
return (uint8_t *)dtv[idx] + offset;
Index: src/libexec/ld.elf_so/arch/aarch64/rtld_start.S
diff -u src/libexec/ld.elf_so/arch/aarch64/rtld_start.S:1.5 src/libexec/ld.elf_so/arch/aarch64/rtld_start.S:1.5.2.1
--- src/libexec/ld.elf_so/arch/aarch64/rtld_start.S:1.5 Thu Mar 24 12:12:00 2022
+++ src/libexec/ld.elf_so/arch/aarch64/rtld_start.S Wed Aug 7 11:00:12 2024
@@ -1,4 +1,4 @@
-/* $NetBSD: rtld_start.S,v 1.5 2022/03/24 12:12:00 andvar Exp $ */
+/* $NetBSD: rtld_start.S,v 1.5.2.1 2024/08/07 11:00:12 martin Exp $ */
/*-
* Copyright (c) 2014 The NetBSD Foundation, Inc.
@@ -60,7 +60,7 @@
#include <machine/asm.h>
-RCSID("$NetBSD: rtld_start.S,v 1.5 2022/03/24 12:12:00 andvar Exp $")
+RCSID("$NetBSD: rtld_start.S,v 1.5.2.1 2024/08/07 11:00:12 martin Exp $")
/*
* void _rtld_start(void (*cleanup)(void), const Obj_Entry *obj,
@@ -146,87 +146,121 @@ ENTRY_NP(_rtld_bind_start)
END(_rtld_bind_start)
/*
- * struct rel_tlsdesc {
- * uint64_t resolver_fnc;
- * uint64_t resolver_arg;
+ * Entry points used by _rtld_tlsdesc_fill. They will be passed in x0
+ * a pointer to:
*
+ * struct rel_tlsdesc {
+ * uint64_t resolver_fnc;
+ * uint64_t resolver_arg;
+ * };
*
- * uint64_t _rtld_tlsdesc_static(struct rel_tlsdesc *);
+ * They are called with nonstandard calling convention and must
+ * preserve all registers except x0.
+ */
+
+/*
+ * uint64_t@x0
+ * _rtld_tlsdesc_static(struct rel_tlsdesc *rel_tlsdesc@x0);
+ *
+ * Resolver function for TLS symbols resolved at load time.
*
- * Resolver function for TLS symbols resolved at load time
+ * rel_tlsdesc->resolver_arg is the offset of the static
+ * thread-local storage region, relative to the start of the TCB.
+ *
+ * Nonstandard calling convention: Must preserve all registers
+ * except x0.
*/
ENTRY(_rtld_tlsdesc_static)
.cfi_startproc
- ldr x0, [x0, #8]
- ret
+ ldr x0, [x0, #8] /* x0 := tcboffset */
+ ret /* return x0 = tcboffset */
.cfi_endproc
END(_rtld_tlsdesc_static)
/*
- * uint64_t _rtld_tlsdesc_undef(void);
+ * uint64_t@x0
+ * _rtld_tlsdesc_undef(struct rel_tlsdesc *rel_tlsdesc@x0);
+ *
+ * Resolver function for weak and undefined TLS symbols.
*
- * Resolver function for weak and undefined TLS symbols
+ * rel_tlsdesc->resolver_arg is the Elf_Rela rela->r_addend.
+ *
+ * Nonstandard calling convention: Must preserve all registers
+ * except x0.
*/
ENTRY(_rtld_tlsdesc_undef)
.cfi_startproc
- str x1, [sp, #-16]!
+ str x1, [sp, #-16]! /* save x1 on stack */
.cfi_adjust_cfa_offset 16
- mrs x1, tpidr_el0
- ldr x0, [x0, #8]
- sub x0, x0, x1
+ mrs x1, tpidr_el0 /* x1 := current thread tcb */
+ ldr x0, [x0, #8] /* x0 := rela->r_addend */
+ sub x0, x0, x1 /* x0 := rela->r_addend - tcb */
- ldr x1, [sp], #16
- .cfi_adjust_cfa_offset -16
+ ldr x1, [sp], #16 /* restore x1 from stack */
+ .cfi_adjust_cfa_offset -16
.cfi_endproc
- ret
+ ret /* return x0 = rela->r_addend - tcb */
END(_rtld_tlsdesc_undef)
/*
- * uint64_t _rtld_tlsdesc_dynamic(struct rel_tlsdesc *);
+ * uint64_t@x0
+ * _rtld_tlsdesc_dynamic(struct rel_tlsdesc *tlsdesc@x0);
+ *
+ * Resolver function for TLS symbols from dlopen().
*
- * Resolver function for TLS symbols from dlopen()
+ * rel_tlsdesc->resolver_arg is a pointer to a struct tls_data
+ * object allocated during relocation.
+ *
+ * Nonstandard calling convention: Must preserve all registers
+ * except x0.
*/
ENTRY(_rtld_tlsdesc_dynamic)
.cfi_startproc
/* Save registers used in fast path */
- stp x1, x2, [sp, #(-2 * 16)]!
- stp x3, x4, [sp, #(1 * 16)]
+ stp x1, x2, [sp, #(-2 * 16)]!
+ stp x3, x4, [sp, #(1 * 16)]
.cfi_adjust_cfa_offset 2 * 16
.cfi_rel_offset x1, 0
.cfi_rel_offset x2, 8
.cfi_rel_offset x3, 16
.cfi_rel_offset x4, 24
- /* Test fastpath - inlined version of __tls_get_addr. */
+ /* Try for the fast path -- inlined version of __tls_get_addr. */
- ldr x1, [x0, #8] /* tlsdesc ptr */
- mrs x4, tpidr_el0
- ldr x0, [x4] /* DTV pointer (tcb->tcb_dtv) */
+ ldr x1, [x0, #8] /* x1 := tlsdesc (struct tls_data *) */
+ mrs x4, tpidr_el0 /* x4 := tcb */
+ ldr x0, [x4] /* x0 := dtv = tcb->tcb_dtv */
- ldr x3, [x0, #-8] /* DTV_MAX_INDEX(dtv) */
- ldr x2, [x1, #0] /* tlsdesc->td_tlsindex */
+ ldr x3, [x0, #-8] /* x3 := max = DTV_MAX_INDEX(dtv) */
+ ldr x2, [x1, #0] /* x2 := idx = tlsdesc->td_tlsindex */
cmp x2, x3
- b.lt 1f /* Slow path */
+ b.gt 1f /* Slow path if idx > max */
+
+ ldr x3, [x0, x2, lsl #3] /* x3 := dtv[idx] */
+ cbz x3, 1f /* Slow path if dtv[idx] is null */
- ldr x3, [x0, x2, lsl #3] /* dtv[tlsdesc->td_tlsindex] */
- cbz x3, 1f
+ /*
+ * Fast path
+ *
+ * return (dtv[tlsdesc->td_tlsindex] + tlsdesc->td_tlsoffs - tcb)
+ */
+ ldr x2, [x1, #8] /* x2 := offs = tlsdesc->td_tlsoffs */
+ add x2, x2, x3 /* x2 := addr = dtv[idx] + offs */
+ sub x0, x2, x4 /* x0 := addr - tcb
- /* Return (dtv[tlsdesc->td_tlsindex] + tlsdesc->td_tlsoffs - tp) */
- ldr x2, [x1, #8] /* tlsdesc->td_tlsoffs */
- add x2, x2, x3
- sub x0, x2, x4
-
- /* Restore registers and return */
- ldp x3, x4, [sp, #(1 * 16)]
- ldp x1, x2, [sp], #(2 * 16)
- .cfi_adjust_cfa_offset -2 * 16
- ret
+ /* Restore fast path registers and return */
+ ldp x3, x4, [sp, #(1 * 16)]
+ ldp x1, x2, [sp], #(2 * 16)
+ .cfi_adjust_cfa_offset -2 * 16
+ ret /* return x0 = addr - tcb */
/*
* Slow path
- * return _rtld_tls_get_addr(tp, tlsdesc->td_tlsindex, tlsdesc->td_tlsoffs);
+ *
+ * return _rtld_tls_get_addr(tp, tlsdesc->td_tlsindex,
+ * tlsdesc->td_tlsoffs);
*
*/
1:
@@ -236,18 +270,18 @@ ENTRY(_rtld_tlsdesc_dynamic)
.cfi_rel_offset x29, 0
.cfi_rel_offset x30, 8
- stp x5, x6, [sp, #(1 * 16)]
- stp x7, x8, [sp, #(2 * 16)]
- stp x9, x10, [sp, #(3 * 16)]
+ stp x5, x6, [sp, #(1 * 16)]
+ stp x7, x8, [sp, #(2 * 16)]
+ stp x9, x10, [sp, #(3 * 16)]
stp x11, x12, [sp, #(4 * 16)]
stp x13, x14, [sp, #(5 * 16)]
stp x15, x16, [sp, #(6 * 16)]
stp x17, x18, [sp, #(7 * 16)]
- .cfi_rel_offset x5, 16
- .cfi_rel_offset x6, 24
- .cfi_rel_offset x7, 32
- .cfi_rel_offset x8, 40
- .cfi_rel_offset x9, 48
+ .cfi_rel_offset x5, 16
+ .cfi_rel_offset x6, 24
+ .cfi_rel_offset x7, 32
+ .cfi_rel_offset x8, 40
+ .cfi_rel_offset x9, 48
.cfi_rel_offset x10, 56
.cfi_rel_offset x11, 64
.cfi_rel_offset x12, 72
@@ -259,31 +293,32 @@ ENTRY(_rtld_tlsdesc_dynamic)
.cfi_rel_offset x18, 120
/* Find the tls offset */
- mov x0, x4 /* tp */
- mov x3, x1 /* tlsdesc ptr */
- ldr x1, [x3, #0] /* tlsdesc->td_tlsindex */
- ldr x2, [x3, #8] /* tlsdesc->td_tlsoffs */
- bl _rtld_tls_get_addr
- mrs x1, tpidr_el0
- sub x0, x0, x1
+ mov x0, x4 /* x0 := tcb */
+ mov x3, x1 /* x3 := tlsdesc */
+ ldr x1, [x3, #0] /* x1 := idx = tlsdesc->td_tlsindex */
+ ldr x2, [x3, #8] /* x2 := offs = tlsdesc->td_tlsoffs */
+ bl _rtld_tls_get_addr /* x0 := addr = _rtld_tls_get_addr(tcb,
+ * idx, offs) */
+ mrs x1, tpidr_el0 /* x1 := tcb */
+ sub x0, x0, x1 /* x0 := addr - tcb */
/* Restore slow path registers */
ldp x17, x18, [sp, #(7 * 16)]
ldp x15, x16, [sp, #(6 * 16)]
ldp x13, x14, [sp, #(5 * 16)]
ldp x11, x12, [sp, #(4 * 16)]
- ldp x9, x10, [sp, #(3 * 16)]
- ldp x7, x8, [sp, #(2 * 16)]
- ldp x5, x6, [sp, #(1 * 16)]
+ ldp x9, x10, [sp, #(3 * 16)]
+ ldp x7, x8, [sp, #(2 * 16)]
+ ldp x5, x6, [sp, #(1 * 16)]
ldp x29, x30, [sp], #(8 * 16)
- .cfi_adjust_cfa_offset -8 * 16
+ .cfi_adjust_cfa_offset -8 * 16
.cfi_restore x29
.cfi_restore x30
/* Restore fast path registers and return */
- ldp x3, x4, [sp, #16]
- ldp x1, x2, [sp], #(2 * 16)
+ ldp x3, x4, [sp, #16]
+ ldp x1, x2, [sp], #(2 * 16)
.cfi_adjust_cfa_offset -2 * 16
.cfi_endproc
- ret
+ ret /* return x0 = addr - tcb */
END(_rtld_tlsdesc_dynamic)
Index: src/tests/libexec/ld.elf_so/t_tls_extern.c
diff -u src/tests/libexec/ld.elf_so/t_tls_extern.c:1.12.2.2 src/tests/libexec/ld.elf_so/t_tls_extern.c:1.12.2.3
--- src/tests/libexec/ld.elf_so/t_tls_extern.c:1.12.2.2 Tue Aug 1 16:34:58 2023
+++ src/tests/libexec/ld.elf_so/t_tls_extern.c Wed Aug 7 11:00:12 2024
@@ -1,4 +1,4 @@
-/* $NetBSD: t_tls_extern.c,v 1.12.2.2 2023/08/01 16:34:58 martin Exp $ */
+/* $NetBSD: t_tls_extern.c,v 1.12.2.3 2024/08/07 11:00:12 martin Exp $ */
/*-
* Copyright (c) 2023 The NetBSD Foundation, Inc.
@@ -382,6 +382,63 @@ ATF_TC_BODY(onlydef_static_dynamic_lazy,
pstatic, pdynamic);
}
+ATF_TC(opencloseloop_use);
+ATF_TC_HEAD(opencloseloop_use, tc)
+{
+ atf_tc_set_md_var(tc, "descr", "Testing opening and closing in a loop,"
+ " then opening and using dynamic TLS");
+}
+ATF_TC_BODY(opencloseloop_use, tc)
+{
+ unsigned i;
+ void *def, *use;
+ int *(*fdef)(void), *(*fuse)(void);
+ int *pdef, *puse;
+
+ /*
+ * Open and close the definition library repeatedly. This
+ * should trigger allocation of many DTV offsets, which are
+ * (currently) not recycled, so the required DTV offsets should
+ * become very long -- pages past what is actually allocated
+ * before we attempt to use it.
+ *
+ * This way, we will exercise the wrong-way-conditional fast
+ * path of PR lib/58154.
+ */
+ for (i = sysconf(_SC_PAGESIZE); i --> 0;) {
+ ATF_REQUIRE_DL(def = dlopen("libh_def_dynamic.so", 0));
+ ATF_REQUIRE_EQ_MSG(dlclose(def), 0,
+ "dlclose(def): %s", dlerror());
+ }
+
+ /*
+ * Now open the definition library and keep it open.
+ */
+ ATF_REQUIRE_DL(def = dlopen("libh_def_dynamic.so", 0));
+ ATF_REQUIRE_DL(fdef = dlsym(def, "fdef"));
+
+ /*
+ * Open libraries that use the definition and verify they
+ * observe the same pointer.
+ */
+ ATF_REQUIRE_DL(use = dlopen("libh_use_dynamic.so", 0));
+ ATF_REQUIRE_DL(fuse = dlsym(use, "fuse"));
+ pdef = (*fdef)();
+ puse = (*fuse)();
+ ATF_CHECK_EQ_MSG(pdef, puse,
+ "%p in defining library != %p in using library",
+ pdef, puse);
+
+ /*
+ * Also verify the pointer can be used.
+ */
+ *pdef = 123;
+ *puse = 456;
+ ATF_CHECK_EQ_MSG(*pdef, *puse,
+ "%d in defining library != %d in using library",
+ *pdef, *puse);
+}
+
ATF_TP_ADD_TCS(tp)
{
@@ -398,6 +455,7 @@ ATF_TP_ADD_TCS(tp)
ATF_TP_ADD_TC(tp, onlydef_dynamic_static_lazy);
ATF_TP_ADD_TC(tp, onlydef_static_dynamic_eager);
ATF_TP_ADD_TC(tp, onlydef_static_dynamic_lazy);
+ ATF_TP_ADD_TC(tp, opencloseloop_use);
ATF_TP_ADD_TC(tp, static_abusedef);
ATF_TP_ADD_TC(tp, static_abusedefnoload);
ATF_TP_ADD_TC(tp, static_defabuse_eager);