ldconfig and setup_arg_pages (a mind dump)

Alex Bennée Fri, 17 Jan 2020 09:33:55 -0800


Hi Richard,


While I was attempting to test the new vsyscall patches for x86 I
discovered I couldn't debootstrap an x86_64 buster image on my ARM box.
After digging further into it I discovered it was because executing
/sbin/ldconfig crashes and aborts the bootstrap.

This is helpfully reproducible on my main development system which is
also running buster:

  ./x86_64-linux-user/qemu-x86_64 /sbin/ldconfig
  setup_arg_pages: 00000040000e0000
  target_set_brk: new_brk=00000040000dfdf8
  do_brk(0000000000000000) -> 00000040000e0000 (!new_brk)
  do_brk(00000040000e11c0) -> do_brk: allocating 8192 => 00007fb2dace5000
  00000040000e0000 (mapped_addr != -1 or brk_page)
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  fish: Job 2, “./x86_64-linux-user/qemu-x86_64…” terminated by signal SIGSEGV 
(Address boundary error)

The failure of the second do_brk during the early setup of the binaries
TLS data area. However for some reason this isn't always the case. For
example with testthread which also uses TLS:

  ./x86_64-linux-user/qemu-x86_64 ./tests/tcg/x86_64-linux-user/testthread
  setup_arg_pages: 0000004000000000
  target_set_brk: new_brk=00000000004c8558
  do_brk(0000000000000000) -> 00000000004c9000 (!new_brk)
  do_brk(00000000004ca1c0) -> do_brk: allocating 8192 => 00000000004c9000
  00000000004ca1c0 (mapped_addr == brk_page)
  do_brk(00000000004eb1c0) -> do_brk: allocating 135168 => 00000000004cb000
  00000000004eb1c0 (mapped_addr == brk_page)
  do_brk(00000000004ec000) -> 00000000004ec000 (new_brk <= brk_page)
  thread1: 0 hello1
  thread2: 0 hello2
  thread1: 1 hello1

Ultimately the failure is down to setup_arg_pages allocating too low in
the address space in the ldconfig case which leaves the second brk
unable to example it's region of memory. Turning on -d page and you can
see the region forming:

  page layout changed following target_mmap
  start            end              size             prot
  0000004000000000-0000004000009000 0000000000009000 r--
  0000004000009000-00000040000ae000 00000000000a5000 r-x
  00000040000ae000-00000040000d8000 000000000002a000 r--
  00000040000d8000-00000040000df000 0000000000007000 rw-
  00000040000df000-00000040000e0000 0000000000001000 ---
  page layout changed following target_mmap
  start            end              size             prot
  0000004000000000-0000004000009000 0000000000009000 r--
  0000004000009000-00000040000ae000 00000000000a5000 r-x
  00000040000ae000-00000040000d8000 000000000002a000 r--
  00000040000d8000-00000040008e1000 0000000000809000 rw-
  setup_arg_pages: 00000040000e0000
  guest_base  0x0
  page layout changed following binary load
  start            end              size             prot
  0000004000000000-0000004000009000 0000000000009000 r--
  0000004000009000-00000040000ae000 00000000000a5000 r-x
  00000040000ae000-00000040000d8000 000000000002a000 r--
  00000040000d8000-00000040000e0000 0000000000008000 rw-
  00000040000e0000-00000040000e1000 0000000000001000 ---
  00000040000e1000-00000040008e1000 0000000000800000 rw-
  start_brk   0x0000000000000000
  end_code    0x00000040000ad971
  start_code  0x0000004000009000
  start_data  0x00000040000d8778
  end_data    0x00000040000de510
  start_stack 0x00000040008e02d0
  brk         0x00000040000dfdf8
  entry       0x000000400000a370
  argv_start  0x00000040008e02d8
  env_start   0x00000040008e02e8
  auxv_start  0x00000040008e0428
  target_set_brk: new_brk=00000040000dfdf8
  page layout changed following target_mmap
  start            end              size             prot
  0000004000000000-0000004000009000 0000000000009000 r--
  0000004000009000-00000040000ae000 00000000000a5000 r-x
  00000040000ae000-00000040000d8000 000000000002a000 r--
  00000040000d8000-00000040000e0000 0000000000008000 rw-
  00000040000e0000-00000040000e1000 0000000000001000 ---
  00000040000e1000-00000040008e2000 0000000000801000 rw-

So it looks like setup_arg_pages just creates a segment right in the
middle of a previously allocated block of storage. This is odd because
the loader basically just leaves it to mmap to pick a region:

    error = target_mmap(0, size + guard, PROT_READ | PROT_WRITE,
                        MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

AFAICT this just depends on where we have allocated last, in the
testthread case we already have a high mapping to splat:

  page layout changed following target_mmap
  start            end              size             prot
  0000000000400000-0000000000401000 0000000000001000 r--
  0000000000401000-0000000000495000 0000000000094000 r-x
  0000000000495000-00000000004bc000 0000000000027000 r--
  00000000004bd000-00000000004c9000 000000000000c000 rw-
  0000004000000000-0000004000801000 0000000000801000 rw-
  setup_arg_pages: 0000004000000000
  guest_base  0x0
  page layout changed following binary load
  start            end              size             prot
  0000000000400000-0000000000401000 0000000000001000 r--
  0000000000401000-0000000000495000 0000000000094000 r-x
  0000000000495000-00000000004bc000 0000000000027000 r--
  00000000004bd000-00000000004c9000 000000000000c000 rw-
  0000004000000000-0000004000001000 0000000000001000 ---
  0000004000001000-0000004000801000 0000000000800000 rw-

And comparing the ldconfig to a "normal" case we can see that the
problem is all of ldconfig has been allocated in the TASK_UNMAPPED_BASE
region. This is due to ldconfig having a DYNAMIC region without a load
address which causes mmap_find_vma to get called to find space for it
and then all the subsequent anonymous regions that are needed:

  load_elf_image: dynamic loaddr 0000000000000000
  mmap_find_vma: 0000004000000000
  load_elf_image: mapping un-backed region: 0000004000000000:0000000000009000
  load_elf_image: mapping un-backed region: 0000004000009000:00000000000a5000
  load_elf_image: mapping un-backed region: 00000040000ae000:000000000002a000
  load_elf_image: mapping un-backed region: 00000040000d8000:0000000000007000
  mmap_find_vma: 00000040000e0000
  setup_arg_pages: 00000040000e0000
  target_set_brk: new_brk=00000040000dfdf8
  mmap_find_vma: 00000040008e1000
  mmap_find_vma: 00000040008e2000
  do_brk(0000000000000000) -> 00000040000e0000 (!new_brk)
  do_brk(00000040000e11c0) -> mmap_find_vma: 00000040000e0000
  do_brk: allocating 8192 => 00007fb999e49000
  00000040000e0000 (mapped_addr != -1 or brk_page)
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped

But no actually this all seems to be normal for dynamically linked
things - but still something must be different:

  ./x86_64-linux-user/qemu-x86_64 ./tests/tcg/x86_64-linux-user/testthread.dyn
  load_elf_image: dynamic loaddr 0000000000000000
  mmap_find_vma: 0000004000000000
  load_elf_image: mapping un-backed region: 0000004000000000:0000000000001000
  load_elf_image: mapping un-backed region: 0000004000001000:0000000000001000
  load_elf_image: mapping un-backed region: 0000004000002000:0000000000001000
  load_elf_image: mapping un-backed region: 0000004000003000:0000000000002000
  mmap_find_vma: 0000004000005000
  setup_arg_pages: 0000004000005000
  load_elf_image: dynamic loaddr 0000000000000000
  mmap_find_vma: 0000004000806000
  load_elf_image: mapping un-backed region: 0000004000806000:0000000000001000
  load_elf_image: mapping un-backed region: 0000004000807000:000000000001e000
  load_elf_image: mapping un-backed region: 0000004000825000:0000000000008000
  load_elf_image: mapping un-backed region: 000000400082d000:0000000000002000
  target_set_brk: new_brk=0000004000004070
  mmap_find_vma: 0000004000830000
  mmap_find_vma: 0000004000831000
  do_brk(0000000000000000) -> 0000004000005000 (!new_brk)
  mmap_find_vma: 0000004000832000
  mmap_find_vma: 0000004000857000
  mmap_find_vma: 0000004000878000
  mmap_find_vma: 000000400087a000
  mmap_find_vma: 0000004000a3b000
  mmap_find_vma: 0000004000a3e000
  do_brk(0000000000000000) -> 0000004000005000 (!new_brk)
  do_brk(0000004000026000) -> mmap_find_vma: 0000004000005000
  do_brk: allocating 135168 => 00007fa00659b000
  0000004000005000 (mapped_addr != -1 or brk_page)
  mmap_find_vma: 000000400123f000
  mmap_find_vma: 000000400923f000

Recompiling testthread as a dynamic executable and it runs fine, leaving
itself enough space to expand the brk region at least once.

So what do we take away from this?

 * we need testcases to exercise the memory layout of dynamic binaries
 * "special" dynamic binaries can break our careful memory layout
 * I feel as though I've trodden on a nest of vipers

Does any of this track with you? What is different about ldconfig that
breaks our memory placement?

-- 
Alex Bennée

qemu-x86_64, buster /sbin/ldconfig and setup_arg_pages (a mind dump)

Reply via email to