Mark,

A third option that Heather and I discussed for the routines that needed 
replication (because sometimes they are called holding a write lock and 
sometimes not) was to put each routine in an include file where the write lock 
status is expected to be defined as a macro. Then, we can define the macro one 
way (write lock held), include the file, then define the macro the other way 
(write lock not held) and include the function again. The write lock status 
would be incorporated into the function name. 

This avoids two copies of the code to maintain and calls the correct helper 
routines that either 
- use the lock already possessed or 
- acquire their own lock if the caller doesn’t have it already.

Would that strategy be acceptable to you. None of the code does anything like 
that as far as I know. 

John Mellor-Crummey

(sent from my phone)

> On Aug 4, 2023, at 8:21 AM, Heather McIntyre via Elfutils-devel 
> <elfutils-devel@sourceware.org> wrote:
> 
> I've been making --enable-thread-safety (USE_LOCKS) more viable by fixing
> deadlocks throughout the libelf library. This has required minimal code
> changes so far. However, I've hit a snag where "__elf64_updatenull_wrlock"
> calls "elf64_getchdr," which leads to "elf64_getshdr." This sequence
> attempts to acquire a read lock under a write lock and fails. Similarly,
> "elf64_getchdr" calls "elf_getdata," which also tries and fails to
> implement a read lock.
> 
> Unfortunately, fixing this particular deadlock requires more than minimal
> code changes. From what I understand, I may have a few options on how to
> fix this:
> 
> 1) Create copies of the getchdr and elf_getdata functions, renaming them
> getchdr_wrlock and elf_getdata_wrlock. However, multiple copies of these
> functions could bloat the code, which may not be ideal.
> 2) Some type of write lock flag to indicate when a write lock is active.
> Either within the locking macro in eu-config.h or passed as an argument in
> the functions.
> 
> Ultimately, I thought others may want to look into this to see if there
> might be options for a better solution. Here is the backtrace:
> 
> Program received signal SIGABRT, Aborted.
> 0x00007ffff7837aff in raise () from /lib64/libc.so.6
> #0  0x00007ffff7837aff in raise () from /lib64/libc.so.6
> #1  0x00007ffff780aea5 in abort () from /lib64/libc.so.6
> #2  0x00007ffff780ad79 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
> #3  0x00007ffff78304c9 in __assert_perror_fail () from /lib64/libc.so.6
> #4  0x00007ffff7fda554 in elf64_getshdr (scn=0x40fc20) at
> ../../libelf/elf32_getshdr.c:292
> #5  0x00007ffff7fe9590 in elf64_getchdr (scn=0x40fc20) at
> ../../libelf/elf32_getchdr.c:45
> #6  0x00007ffff7fdf690 in __elf64_updatenull_wrlock (elf=0x40d880,
> change_bop=0x7fffffffac60, shnum=30) at ../../libelf/elf32_updatenull.c:407
> #7  0x00007ffff7fdd83d in elf_update (elf=0x40d880, cmd=ELF_C_WRITE) at
> ../../libelf/elf_update.c:211
> #8  0x0000000000405d9e in process_file (fname=0x7fffffffbb96
> "testfile-s390x-hash-both") at ../../src/elfcompress.c:1336
> #9  0x0000000000406232 in main (argc=7, argv=0x7fffffffb768) at
> ../../src/elfcompress.c:1458

Reply via email to