On 24.07.2024 12:24, Konstantin Belousov wrote:
On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote:
On Jul 23, 2024, at 13:46, Michal Meloun <meloun.mic...@gmail.com> wrote:
On 23.07.2024 11:36, Konstantin Belousov wrote:
On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
The good news is that I'm finally able to generate a working/locking
test case. The culprit (at least for me) is if "-mcpu" is used when
compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf).
If it is not used, libthr is broken (regardless of -O level or debug/normal
build), but -mcpu=cortex-a15 will always produce a working libthr.
I think this is very significant progress.
Do you plan to drill down more to see what is going on?
So the problem is now clear, and I fear it may apply to other architectures as
well.
dlopen_object() (from rtld_elf),
https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
holds the rtld_bind_lock write lock for almost the entire time a new library is
loaded.
If the code uses a yet unresolved symbol to load the library, the rtl_bind()
function attempts to get read lock of rtld_bind_lock and a deadlock occurs.
In this case, it round_up() in _thr_stack_fix_protection,
https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
Issued by __aeabi_uidiv (since not all armv7 processors support HW divide).
Unfortunately, I'm not sure how to fix it. The compiler can emit __aeabi_<> in
any place, and I'm not sure if it can resolve all the symbols used by rtld_eld and
libthr beforehand.
Michal
In this case (but not for all _aeabi_ functions) we can avoid division
as long as page size is a power of 2.
The function is
static inline size_t
round_up(size_t size)
{
if (size % _thr_page_size != 0)
size = ((size / _thr_page_size) + 1) *
_thr_page_size;
return size;
}
The body can be condensed to
return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);
This is shorter in both lines of code and instruction bytes.
Lets not allow this to be lost. Could anybody confirm that the patch
below fixes the issue?
commit d560f4f6690a48476565278fd07ca131bf4eeb3c
Author: Konstantin Belousov <k...@freebsd.org>
Date: Wed Jul 24 13:17:55 2024 +0300
rtld: avoid division in __thr_map_stacks_exec()
The function is called by rtld with the rtld bind lock write-locked,
when fixing the stack permission during dso load. Not every ARMv7 CPU
supports the div, which causes the recursive entry into rtld to resolve
the __aeabi_uidiv symbol, causing self-lock.
Workaround the problem by using roundup2() instead of open-coding less
efficient formula.
Diagnosed by: mmel
Based on submission by: John F Carr <j...@mit.edu>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
For final resolving of deadlocks, after a full day of digging, I'm very
much incline of adding -znow to the linker flags for libthr.so (and
maybe also for ld-elf.so). The runtime cost of resolving all symbols at
startup is very low. Direct pre-solving in _thr_rtld_init() is
problematic for the _aeabi_* symbols, since they don't have an official
C prototypes, and some are not compatible with C calling conventions.
Warner, Konstantin, could you please comment on this?
Michal