On Thu, 2024-10-24 at 21:09 +0900, Hajime Tazaki wrote: > This commit adds a mechanism to hook syscalls for unmodified userspace > programs used under UML in !MMU mode. The mechanism, called zpoline, > translates syscall/sysenter instructions with `call *%rax`, which can be > processed by a trampoline code also installed upon an initcall during > boot. The translation is triggered by elf_arch_finalize_exec(), an arch > hook introduced by another commit. > > All syscalls issued by userspace thus redirected to a speicific function,
typo: "specific" > + if (down_write_killable(&mm->mmap_lock)) { > + err = -EINTR; > + return err; ? What happens if the binary JITs some code and you don't find it? I don't remember from your talk - there you seemed to say this was fine just slow, but that was zpoline in a different context (container)? Perhaps UML could additionally install a seccomp filter or something on itself while running a userspace program? Hmm. > +/** > + * setup trampoline code for syscall hooks > + * > + * the trampoline code guides to call hooked function, __kernel_vsyscall > + * in this case, via nop slides at the memory address zero (thus, zpoline). > + * > + * loaded binary by exec(2) is translated to call the function. > + */ > +static int __init setup_zpoline_trampoline(void) > +{ > + int i, ret; > + int ptr; > + > + /* zpoline: map area of trampoline code started from addr 0x0 */ > + __zpoline_start = 0x0; > + > + ret = os_map_memory((void *) 0, -1, 0, 0x1000, 1, 1, 1); (UM_)PAGE_SIZE? > + /** > + * FIXME: shit red zone area to properly handle the case "shift"? :) > + */ > + > + /** > + * put code for jumping to __kernel_vsyscall. > + * > + * here we embed the following code. > + * > + * movabs [$addr],%r11 > + * jmpq *%r11 > + * > + */ > + ptr = NR_syscalls; > + /* 49 bb [64-bit addr (8-byte)] movabs [64-bit addr (8-byte)],%r11 */ > + __zpoline_start[ptr++] = 0x49; > + __zpoline_start[ptr++] = 0xbb; > + __zpoline_start[ptr++] = ((uint64_t) > + __kernel_vsyscall >> (8 * 0)) & 0xff; &0xff seems pointless with a u8 array? > + /* permission: XOM (PROT_EXEC only) */ > + ret = os_protect_memory(0, 0x1000, 0, 0, 1); (UM_)PAGE_SIZE? johannes