Hi, Ludovic Courtès <l...@gnu.org> skribis:
> … so ‘exec_load’ is doing its job, it seems. Turns out that may not be the case. Here’s a *bad* mapping on the second ‘task_resume’ breakpoint (when ‘exec’ is about to start): --8<---------------cut here---------------start------------->8--- db> show all threads TASK THREADS 0 gnumach (f5f7cf00): 7 threads: 0 (f5f7be18) .W..N. 0xc11dac04 1 (f5f7bcd0) R..O..(idle_thread_continue) 2 (f5f7bb88) .W.ON.(reaper_thread_continue) 0xc12015d4 3 (f5f7ba40) .W.ON.(swapin_thread_continue) 0xc11f8e2c 4 (f5f7b8f8) .W.ON.(sched_thread_continue) 0 5 (f5f7b7b0) .W.ON.(io_done_thread_continue) 0xc1201f74 6 (f5f7b668) .W.ON.(net_thread_continue) 0xc11db0a8 1 ext2fs (f5f7ce40): 6 threads: 0 (f5f7b520) R....F 1 (f5f7b290) .W.O..(mach_msg_receive_continue) 0 2 (f5f7b148) .W.O..(mach_msg_receive_continue) 0 3 (f5f7b000) .W.O..(mach_msg_continue) 0 4 (f67d3e20) .W.O..(mach_msg_receive_continue) 0 5 (f67d3cd8) .W.O..(mach_msg_continue) 0 2 exec (f5f7cd80): (f5f7b3d8) ..SO..(thread_bootstrap_return) db> trace task_resume(f593e010,fb7d9010,f5f73e80,c106972a) ipc_kobject_server(f593e000,3,18,0)+0x1eb mach_msg_trap(bffff4c0,3,18,20,8)+0x1703 >>>>> user space <<<<< db> x/tbx 0xcbc 0xf5f7b3d8 no memory is assigned to address 00000cbc 0 db> show map $map2 Map 0xf5f6ff30: name="exec", pmap=0xf5f71fa8,ref=1,nentries=5 size=290816,resident:225280,wired=0 version=13 map entry 0xf625ec08: start=0x0, end=0x1000 prot=1/7/copy, object=0x0, offset=0x0 map entry 0xf625ebb0: start=0x1000, end=0x26000 prot=5/7/copy, object=0xf5f6ad70, offset=0x0 Object 0xf5f6ad70: size=0x25000, 1 references 37 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82780 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb58: start=0x26000, end=0x34000 prot=1/7/copy, object=0xf5f6ad20, offset=0x0 Object 0xf5f6ad20: size=0xe000, 1 references 14 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82730 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb00: start=0x34000, end=0x37000 prot=3/7/copy, object=0xf5f6acd0, offset=0x0 Object 0xf5f6acd0: size=0x3000, 1 references 3 resident pages,--db_more-- --8<---------------cut here---------------end--------------->8--- Compare with what a “good” mapping looks like at that same moment: --8<---------------cut here---------------start------------->8--- start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1]Kernel Breakpoint trap, eip 0xc1030d5b Breakpoint at task_resume: pushl %ebp db> show all threads TASK THREADS 0 gnumach (f5f7cf00): 7 threads: 0 (f5f7be18) .W..N. 0xc11dac04 1 (f5f7bcd0) R..O..(idle_thread_continue) 2 (f5f7bb88) .W.ON.(reaper_thread_continue) 0xc12015d4 3 (f5f7ba40) .W.ON.(swapin_thread_continue) 0xc11f8e2c 4 (f5f7b8f8) .W.ON.(sched_thread_continue) 0 5 (f5f7b7b0) .W.ON.(io_done_thread_continue) 0xc1201f74 6 (f5f7b668) .W.ON.(net_thread_continue) 0xc11db0a8 1 ext2fs (f5f7ce40): 6 threads: 0 (f5f7b520) R....F 1 (f5f7b290) .W.O..(mach_msg_receive_continue) 0 2 (f5f7b148) .W.O..(mach_msg_receive_continue) 0 3 (f5f7b000) .W.O..(mach_msg_continue) 0 4 (f67d2e20) .W.O..(mach_msg_receive_continue) 0 5 (f67d2cd8) .W.O..(mach_msg_continue) 0 2 exec (f5f7cd80): (f5f7b3d8) ..SO..(thread_bootstrap_return) db> x/tbx 0xcbc 0xf5f7b3d8 8 db> show map $map2 Map 0xf5f6ff30: name="exec", pmap=0xf5f71fa8,ref=1,nentries=5 size=290816,resident:229376,wired=0 version=14 map entry 0xf625ec08: start=0x0, end=0x1000 prot=1/7/copy, object=0xf5f6ad70, offset=0x0 Object 0xf5f6ad70: size=0x1000, 1 references 1 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82780 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625ebb0: start=0x1000, end=0x26000 prot=5/7/copy, object=0xf5f6ad20, offset=0x0 Object 0xf5f6ad20: size=0x25000, 1 references 37 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82730 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb58: start=0x26000, end=0x34000 prot=1/7/copy, object=0xf5f6acd0, offset=0x0 Object 0xf5f6acd0: size=0xe000, 1 references 14 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f826e0 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb00: start=0x34000, end=0x37000 prot=3/7/copy, object=0xf5f6ac80, offset=0x0 Object 0xf5f6ac80: size=0x3000, 1 references 3 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82690 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eaa8: start=0xbfff0000, end=0xc0000000 prot=3/7/copy, object=0xf5f6ac30, offset=0x0 Object 0xf5f6ac30: size=0x10000, 1 references 1 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82640 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 --8<---------------cut here---------------end--------------->8--- Notice that 0xcbc reads a valid relocation, where 8 = R_386_RELATIVE. In the “bad” case, the first map entry is empty, with no associated memory object and zero resident pages. My reading of ‘read_exec’ is that the page is supposed to be populated eagerly by the ‘copyout’ call here: --8<---------------cut here---------------start------------->8--- static int read_exec(void *handle, vm_offset_t file_ofs, vm_size_t file_size, vm_offset_t mem_addr, vm_size_t mem_size, exec_sectype_t sec_type) { struct multiboot_module *mod = handle; [...] err = vm_allocate(user_map, &start_page, end_page - start_page, FALSE); assert(err == 0); assert(start_page == trunc_page(mem_addr)); if (file_size > 0) { err = copyout((char *)phystokv (mod->mod_start) + file_ofs, (void *)mem_addr, file_size); assert(err == 0); } [...] return 0; } --8<---------------cut here---------------end--------------->8--- There are interesting tricks in ‘copyout_retry’ to fake a page fault so the copy can actually be made, IIUC. Could it be that this bit isn’t quite working? Ideas? Problem with debugging this is that setting a breakpoint on ‘exec_load’ causes the system to boot fine (breaking on ‘task_resume’ is fine tough, go figure…). Ludo’.