Hi Richard,
While I was attempting to test the new vsyscall patches for x86 I discovered I couldn't debootstrap an x86_64 buster image on my ARM box. After digging further into it I discovered it was because executing /sbin/ldconfig crashes and aborts the bootstrap. This is helpfully reproducible on my main development system which is also running buster: ./x86_64-linux-user/qemu-x86_64 /sbin/ldconfig setup_arg_pages: 00000040000e0000 target_set_brk: new_brk=00000040000dfdf8 do_brk(0000000000000000) -> 00000040000e0000 (!new_brk) do_brk(00000040000e11c0) -> do_brk: allocating 8192 => 00007fb2dace5000 00000040000e0000 (mapped_addr != -1 or brk_page) qemu: uncaught target signal 11 (Segmentation fault) - core dumped fish: Job 2, “./x86_64-linux-user/qemu-x86_64…” terminated by signal SIGSEGV (Address boundary error) The failure of the second do_brk during the early setup of the binaries TLS data area. However for some reason this isn't always the case. For example with testthread which also uses TLS: ./x86_64-linux-user/qemu-x86_64 ./tests/tcg/x86_64-linux-user/testthread setup_arg_pages: 0000004000000000 target_set_brk: new_brk=00000000004c8558 do_brk(0000000000000000) -> 00000000004c9000 (!new_brk) do_brk(00000000004ca1c0) -> do_brk: allocating 8192 => 00000000004c9000 00000000004ca1c0 (mapped_addr == brk_page) do_brk(00000000004eb1c0) -> do_brk: allocating 135168 => 00000000004cb000 00000000004eb1c0 (mapped_addr == brk_page) do_brk(00000000004ec000) -> 00000000004ec000 (new_brk <= brk_page) thread1: 0 hello1 thread2: 0 hello2 thread1: 1 hello1 Ultimately the failure is down to setup_arg_pages allocating too low in the address space in the ldconfig case which leaves the second brk unable to example it's region of memory. Turning on -d page and you can see the region forming: page layout changed following target_mmap start end size prot 0000004000000000-0000004000009000 0000000000009000 r-- 0000004000009000-00000040000ae000 00000000000a5000 r-x 00000040000ae000-00000040000d8000 000000000002a000 r-- 00000040000d8000-00000040000df000 0000000000007000 rw- 00000040000df000-00000040000e0000 0000000000001000 --- page layout changed following target_mmap start end size prot 0000004000000000-0000004000009000 0000000000009000 r-- 0000004000009000-00000040000ae000 00000000000a5000 r-x 00000040000ae000-00000040000d8000 000000000002a000 r-- 00000040000d8000-00000040008e1000 0000000000809000 rw- setup_arg_pages: 00000040000e0000 guest_base 0x0 page layout changed following binary load start end size prot 0000004000000000-0000004000009000 0000000000009000 r-- 0000004000009000-00000040000ae000 00000000000a5000 r-x 00000040000ae000-00000040000d8000 000000000002a000 r-- 00000040000d8000-00000040000e0000 0000000000008000 rw- 00000040000e0000-00000040000e1000 0000000000001000 --- 00000040000e1000-00000040008e1000 0000000000800000 rw- start_brk 0x0000000000000000 end_code 0x00000040000ad971 start_code 0x0000004000009000 start_data 0x00000040000d8778 end_data 0x00000040000de510 start_stack 0x00000040008e02d0 brk 0x00000040000dfdf8 entry 0x000000400000a370 argv_start 0x00000040008e02d8 env_start 0x00000040008e02e8 auxv_start 0x00000040008e0428 target_set_brk: new_brk=00000040000dfdf8 page layout changed following target_mmap start end size prot 0000004000000000-0000004000009000 0000000000009000 r-- 0000004000009000-00000040000ae000 00000000000a5000 r-x 00000040000ae000-00000040000d8000 000000000002a000 r-- 00000040000d8000-00000040000e0000 0000000000008000 rw- 00000040000e0000-00000040000e1000 0000000000001000 --- 00000040000e1000-00000040008e2000 0000000000801000 rw- So it looks like setup_arg_pages just creates a segment right in the middle of a previously allocated block of storage. This is odd because the loader basically just leaves it to mmap to pick a region: error = target_mmap(0, size + guard, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); AFAICT this just depends on where we have allocated last, in the testthread case we already have a high mapping to splat: page layout changed following target_mmap start end size prot 0000000000400000-0000000000401000 0000000000001000 r-- 0000000000401000-0000000000495000 0000000000094000 r-x 0000000000495000-00000000004bc000 0000000000027000 r-- 00000000004bd000-00000000004c9000 000000000000c000 rw- 0000004000000000-0000004000801000 0000000000801000 rw- setup_arg_pages: 0000004000000000 guest_base 0x0 page layout changed following binary load start end size prot 0000000000400000-0000000000401000 0000000000001000 r-- 0000000000401000-0000000000495000 0000000000094000 r-x 0000000000495000-00000000004bc000 0000000000027000 r-- 00000000004bd000-00000000004c9000 000000000000c000 rw- 0000004000000000-0000004000001000 0000000000001000 --- 0000004000001000-0000004000801000 0000000000800000 rw- And comparing the ldconfig to a "normal" case we can see that the problem is all of ldconfig has been allocated in the TASK_UNMAPPED_BASE region. This is due to ldconfig having a DYNAMIC region without a load address which causes mmap_find_vma to get called to find space for it and then all the subsequent anonymous regions that are needed: load_elf_image: dynamic loaddr 0000000000000000 mmap_find_vma: 0000004000000000 load_elf_image: mapping un-backed region: 0000004000000000:0000000000009000 load_elf_image: mapping un-backed region: 0000004000009000:00000000000a5000 load_elf_image: mapping un-backed region: 00000040000ae000:000000000002a000 load_elf_image: mapping un-backed region: 00000040000d8000:0000000000007000 mmap_find_vma: 00000040000e0000 setup_arg_pages: 00000040000e0000 target_set_brk: new_brk=00000040000dfdf8 mmap_find_vma: 00000040008e1000 mmap_find_vma: 00000040008e2000 do_brk(0000000000000000) -> 00000040000e0000 (!new_brk) do_brk(00000040000e11c0) -> mmap_find_vma: 00000040000e0000 do_brk: allocating 8192 => 00007fb999e49000 00000040000e0000 (mapped_addr != -1 or brk_page) qemu: uncaught target signal 11 (Segmentation fault) - core dumped But no actually this all seems to be normal for dynamically linked things - but still something must be different: ./x86_64-linux-user/qemu-x86_64 ./tests/tcg/x86_64-linux-user/testthread.dyn load_elf_image: dynamic loaddr 0000000000000000 mmap_find_vma: 0000004000000000 load_elf_image: mapping un-backed region: 0000004000000000:0000000000001000 load_elf_image: mapping un-backed region: 0000004000001000:0000000000001000 load_elf_image: mapping un-backed region: 0000004000002000:0000000000001000 load_elf_image: mapping un-backed region: 0000004000003000:0000000000002000 mmap_find_vma: 0000004000005000 setup_arg_pages: 0000004000005000 load_elf_image: dynamic loaddr 0000000000000000 mmap_find_vma: 0000004000806000 load_elf_image: mapping un-backed region: 0000004000806000:0000000000001000 load_elf_image: mapping un-backed region: 0000004000807000:000000000001e000 load_elf_image: mapping un-backed region: 0000004000825000:0000000000008000 load_elf_image: mapping un-backed region: 000000400082d000:0000000000002000 target_set_brk: new_brk=0000004000004070 mmap_find_vma: 0000004000830000 mmap_find_vma: 0000004000831000 do_brk(0000000000000000) -> 0000004000005000 (!new_brk) mmap_find_vma: 0000004000832000 mmap_find_vma: 0000004000857000 mmap_find_vma: 0000004000878000 mmap_find_vma: 000000400087a000 mmap_find_vma: 0000004000a3b000 mmap_find_vma: 0000004000a3e000 do_brk(0000000000000000) -> 0000004000005000 (!new_brk) do_brk(0000004000026000) -> mmap_find_vma: 0000004000005000 do_brk: allocating 135168 => 00007fa00659b000 0000004000005000 (mapped_addr != -1 or brk_page) mmap_find_vma: 000000400123f000 mmap_find_vma: 000000400923f000 Recompiling testthread as a dynamic executable and it runs fine, leaving itself enough space to expand the brk region at least once. So what do we take away from this? * we need testcases to exercise the memory layout of dynamic binaries * "special" dynamic binaries can break our careful memory layout * I feel as though I've trodden on a nest of vipers Does any of this track with you? What is different about ldconfig that breaks our memory placement? -- Alex Bennée