Hi all,

I've encountered a strange issue when compiling C to a flat binary with GCC.
It's questionably a bug, but I hesitate to strongly say that due to my lack of
familiarity with the GCC codebase and the rather obscure nature of what I'm
trying to do. Posting this here in the hopes that someone can either a) confirm
this is in fact a GCC bug (in which case I'll file a bug report) or b) tell me
what I'm doing wrong.

First off, system information:

$ head -1 /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"

$ uname -a
Linux desktop 5.9.0-0.bpo.2-amd64 #1 SMP Debian 5.9.6-1~bpo10+1 (2020-11-19) 
x86_64 GNU/Linux

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' 
--with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs 
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr 
--with-gcc-major-version-only --program-suffix=-8 
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie 
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto 
--enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic 
--enable-offload-targets=nvptx-none --without-cuda-driver 
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu 
--target=x86_64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-6)

I'm currently in the process of writing a packer for ELF binaries (similar to
UPX). To do this, I compile my loader code into a position-independent flat
binary with no dependency on glibc using a custom linker script and then inject
it into an ELF binary, where it as well as the packed binary get loaded on
exec. The loader code is passed initial control, decrypts/decompresses the
packed binary, and hands control to the unpacked binary.

The problem comes when I try to take the address of a function that resides
outside the current translation unit in the loader code. Instead of &func
evaluating to the address of the function func, it evaluates to the first
sizeof(void *) bytes of the _code_ of the function func. I've created a minimal
reproduction here:

$ ls
func.c  link.lds  Makefile  prog.c

$ cat prog.c
extern void func();

void _start()
{
  void *ptr = (void *) func;
}

$ cat func.c
void func()
{

}

$ cat link.lds
OUTPUT_FORMAT("binary")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)

SECTIONS
{
  . = 0x0000000001000000;
  .text : {
    *(.text)
  }
  .data : {
    *(.data)
  }
}

rhys@desktop:~/repro$ cat Makefile
CFLAGS = -fpie -nostdlib -nostartfiles -nodefaultlibs -fno-builtin -c -I ..
LDFLAGS = -pie

OBJS = prog.o func.o
BIN = prog

all: $(OBJS)
        $(LD) $(LDFLAGS) $(OBJS) -T link.lds -o $(BIN)

%.o: %.c
        $(CC) $(CFLAGS) $< -o $@

%.o: %.S
        $(AS) $< -o $@

clean:
        rm -f $(OBJS) $(BIN)

As can be seen, this setup is pretty simple, _start gets control, and does
nothing other than load the address of func (which is located in another
translation unit) into ptr. Disassembling the output binary with radare2, I see
the following:

┌ 18: fcn.00000000 ();
│           ; var int64_t var_8h @ rbp-0x8
│           0x00000000      55             push rbp
│           0x00000001      4889e5         mov rbp, rsp
│           0x00000004      488b05070000.  mov rax, qword [0x00000012] ; 
fcn.00000012
│                                                                      ; 
[0x12:8]=0xc35d90e5894855
│           0x0000000b      488945f8       mov qword [var_8h], rax
│           0x0000000f      90             nop
│           0x00000010      5d             pop rbp
└           0x00000011      c3             ret

┌ 7: fcn.00000012 ();
│           0x00000012      55             push rbp
│           0x00000013      4889e5         mov rbp, rsp
│           0x00000016      90             nop
│           0x00000017      5d             pop rbp
└           0x00000018      c3             ret

Note how fcn.00000000 corresponds to _start and fcn.00000012 corresponds to
func. Now note the instructions at address 0x00000004 and 0x0000000b. They
should be loading the _address_ 0x00000012 into rax and then loading rax into
var_8h on the stack but instead load the eightbyte value _at_ address
0x00000012 int rax (ie. the first eightbytes of func's code).

Removing the OUTPUT_FORMAT("binary") line from link.lds so that the linker
instead creates an ELF binary, the problem is not present:

┌ 18: entry0 ();
│           ; var int64_t var_8h @ rbp-0x8
│           0x01000000      55             push rbp                    ; [01] 
-r-x section size 25 named .text
│           0x01000001      4889e5         mov rbp, rsp
│           0x01000004      488d05070000.  lea rax, [sym.func]         ; 
0x1000012
│           0x0100000b      488945f8       mov qword [var_8h], rax
│           0x0100000f      90             nop
│           0x01000010      5d             pop rbp
└           0x01000011      c3             ret
            ; DATA XREF from entry0 @ 0x1000004
┌ 7: sym.func ();
│           0x01000012      55             push rbp
│           0x01000013      4889e5         mov rbp, rsp
│           0x01000016      90             nop
│           0x01000017      5d             pop rbp
└           0x01000018      c3             ret

Note how the mov has been replaced with an lea, ensuring that the address and
not the data is loaded.

In as short of an example as I could come up with, this is the problem I'm
encountering. I did a bit of digging in the source of gnu ld and gcc and made
a bit of headway, but didn't manage to root-cause this. Taking a look at the
relocations in prog.o in this example, we can see that there is a relocation
of type R_X86_64_GOTPCRELX for func.

$ readelf --relocs prog.o | grep func
000000000007  000a0000002a R_X86_64_REX_GOTP 0000000000000000 func - 4

Looking through the code for GNU ld, it seems this type of relocation, when
occurring in a mov instruction is sometimes changed into an lea (indeed, this
seems to be the reason for this type of relocation's existence:
https://groups.google.com/g/x86-64-abi/c/n9AWHogmVY0), but for whatever reason,
that's not happening when I link into a flat binary as above.

If anyone could confirm this is a bug or tell me what I'm doing wrong, that
would be greatly appreciated. Thanks in advance for any help provided!

Cheers,
Rhys

Reply via email to